Questions related to DrWatson

Hi, I just discovered the DrWatson package and I’m curious to see if I could use that in the future (to replace my “custom” package, which works OK until now but is not really clean/documented/extensible… :D).

With my package, each “experiment” is a succession of functions, each having its own parameter space. I typically run machine learning workflows where you first load/generate a dataset, then run some method on it, and then evaluate the results. So one experiment could be for instance to run 10 different methods (each with some combinations of parameters), each on 5 datasets (possibly also with parameters) in order to compare their performance. How would you deal with that using DrWatson, would I need to create a single main wrapper which would dispatch the arguments to the different functions? (Which means that, every time I add a new parameter, I also need to modify my main wrapper?)

Also, I read in the doc that it’s possible to dispatch experiments on a cluster, but how do you deal with the code stored on the cluster? I mean, locally you just instantiate the manifest file, but how do you ensure that the remote host also runs the same version of the code (especially if working with dev packages)? Or do you have custom code to make this automatic (of course it would depend on the interface of the cluster…)?

And also, in my settings the experiments are randomized, so I always need to run X times a given experiment. Is it supported somehow? Of course one can add a loop in the main function, but it would be great to store the results incrementally, so that it’s possibly to “resume” an experiment which did not finish (e.g because of a limited runtime), but still produced results for the first Y<X iterations.

Well these would be my main questions! Just to be clear I’m not criticizing at all, DrWatson seems to be a really cool package, I’m just wondering how my workflow could “fit” and what I would potentially miss.

1 Like