DrWatson - the perfect sidekick to your scientific inquiries!

theogf · April 18, 2019, 2:18pm

This is is just amazing! I started to work on the exact same idea but ended up giving up!

An additional feature I had in mind was to have a GUI (based on Electron and Interact.jl) to prepare the simulation runs. This would include create/edit a template, and save/load specific config files.
I would be happy to contribute in any case!

Datseris · April 18, 2019, 2:22pm

Great, good to have you on board!

I don’t fully understand your suggestion, so I think it is best to open a feature request issue to explain it in detail! I do want to comment though that Electron+Interact are quite heavy dependencies and also not only Julia dependencies which is something one should always think twice before adding. But of course it could be worth it.

foobar_lv2 · April 18, 2019, 8:31pm

You tackle stashing files that don’t permit metadata in their format: Generate a filename that encodes parameter values. That kind of scheme has a second part: One needs to parse back the filename into its parameter values.

When I do this kind of thing in a project, then it is very annoying to generate a parameterset->filename mapping, plus regex to reconstruct the parameterset from the filename, in a way that is still human readable and does not lead to extremely long names. This is a giant ugly kludge, and kudos for trying to deal with it for us.

While I saw that you tackle name generation, I did not see any mention of parsing of names in the docs. Is that supported?

Datseris · April 18, 2019, 9:13pm

Thank you very much for your kind words. Parsing of names is not yet implemented, however I always thought about it. I believe this is a functionality we should have and that it is also easy to implement.

We just didn’t have the manpower to do it until this time. I’ve opened up an issue that summarizes the process: https://github.com/JuliaDynamics/DrWatson.jl/issues/38 contributions would be super welcome, otherwise I will do it as time permits!

grero · April 19, 2019, 12:10am

Interestingly , this is something I tried tackle, though inelegantly, in my DataProcessingHierarchyTools package. I basically needed a way to navigate a directory structure with the directory names defining a certain level of analysis. In my particular case, I am analysing neural data, and I have some analysis that run on an entire session, some on arrays of recording channels for that session, and some on individual cells. This tool allows me to automatically navigate to the appropriate level by defining a level parameter attached to each analysis type.
What I ended up doing for parameters was simply to attach a hash of those parameters to the file name, so that when I run analysis with identical arguments, the results are simply loaded. Of course, this means that I can’t tell what the arguments were simply by looking at the filename. Anyway, DrWatson seems to be much more polished version of this, and as I said before, I’ll try to integrate that into my workflow.

ValdarT · April 19, 2019, 8:14am

DVC works quite nicely for me for versioning large files using S3 as the storage. It’s clearly focused on predictive modelling workflows but the versioning system is generic so it could fit different use cases.

theogf · April 23, 2019, 11:15am

This is what I had in mind (this is WIP)
That would be the template creator. Then one could just select parameters and save them in a config file.

Datseris · April 23, 2019, 11:24am

Hey, this seems cool. But can you explain its purpose? What is this GUI supposed to achieve? (i.e. what does one do with the saved config file?)

Also, what are all these fields, like field name? I can see that the field name you wrote has space so it can’t be a Julia variable.

theogf · April 23, 2019, 11:29am

Well the idea would be to first have a template config file for a project.
Then one could create specific config files for each experiment that would directly be fed to the workspace to run an experiment, similarly to your dict_list function.
From my experience I find it easier and less error-prone to work in a GUI to select parameters instead of manipulating dictionaries directly. It also allows to have the config file saved in the results folder as well.

And the field are not directly julia variables. They would simply be fields names.

Datseris · April 23, 2019, 11:41am

I see, but to really understand this, I would still need a usage demonstration or at least explanation. What do you do with the config file? How do you use it? What is the config file? is it XML, Julia, Toml? What’s its type? How do you actually use it in a simulation? Also, don’t you need to write a special parser for this to work?

For the dictionary all these questions are immediately answered since it is a basic Julia structure.

Yes this is a valid point, but one should consider that you may need to do these things over a cluster, or a cloud, or any other connection that won’t be able to support this. This is an advantage of the dictionary approach. A second advantage is that it works consistently with any conceivable type, existing or not (due to how we handle Vector subtypes). A final point is simply that Electron is a very heavy dependency.

Please notice: I am not bashing you or anything. From personal experience, the best way to improve something is to be as critical as you can, which is what I do here.

Do you have the code for this somewhere?

theogf · April 23, 2019, 1:41pm

Thanks for the helpful feedback. I did not think the whole thing through but my pipeline idea would be :
Create Template Config File → Save as JSON (contains field name, default value and limits/options)
Create Config File → Open existing Template file → Set values → Save as a Dict (where field names are keys) in JSON.
The config file can then be directly fed by being read as a Dict.
In short it is simply a practical (?) config file GUI maker

I advanced a bit more to make things maybe more clear

For the code it’s pretty ugly so far but you can still check it out if you want :
https://github.com/theogf/MLExps.jl
As you see I was going in the same direction as you did. You can simply check src/gui_config.jl and test/test_gui.jl

Datseris · April 23, 2019, 2:06pm

Thanks for the responce! To keep this post as on-topic as possible, I’ve continued further points in the repo you shared: https://github.com/theogf/MLExps.jl/issues/1

wlandau · June 21, 2019, 11:25am

Author of drake chiming in here. TL;DR: I think the most direct apples-to-apples comparisons here are DrWatson vs MLFlow and drake vs GNU Make.

Is this something similar to https://ropensci.github.io/drake/

DrWatson has different goals: version control, sharing, project file structure, and provenance (keeping track of simulation settings). In these respects, it is more like MLFlow, especially when it comes to tracking. drake, on the other hand, tries to be Make for R. drake synchronizes expensive computations in an end-to-end pipeline so repeated full runs take minimal time. drake analyses your targets and functions to figure out what needs to run and what can be skipped, taking into account that some computational steps depend on others,.

I will say that DrWatson and drake are similar in that they both abstract away output file management and reduce manual bookkeeping.

No, from the 6-minute video I just watched they don’t seem similar. DrWatson also seems to have a much simpler and cleaner approach to helping you (e.g. you don’t have to make a “DataFrame” out of every function in your code!)

I disagree with that characterization of the data frame (the drake plan). You do not need to wrap up all your functions in it. In drake, you can define your supporting functions wherever and however you want, and then your commands in the plan simply reference those functions as needed. Most of your code lives in the functions, as is the case for cleanly-implemented scientific workflows in general, drake or no drake.

The drake plan is like a Makefile for R. The main differences are

You are writing R code.
The syntax is much easier than Make wildcards.
You do not need to list out all the dependencies of each target manually by hand. drake automatically discovers dependency relationships among your targets and functions using static code analysis. (See the graph above.)

kevbonham · June 21, 2019, 11:37am

I’m not familiar with drake, but is it analogous to snakemake? There’s also Makeitso.jl… wondering where that fits in.

wlandau · June 21, 2019, 12:39pm

Yes, drake is much more similar to snakemake and Makeitso.jl.

Datseris · June 23, 2019, 3:22pm

Hi, @wlandau, thanks for chiming in. Welcome to the Julia discourse! Glad that this post somehow made its way to you so we can get a more fair representation of the other software mentioned here!

Okay, I can accept this. I didn’t spend a lot of time to learn drake so I may not be accurate in my judgement. But be aware though, that the workflow dependency graph that you showcase in your original comment is already too complicated for me as a working scientist and also compared to what DrWatson needs to achieve. I think we just have different goals and/or target groups.

Correct but, lets not forget the “Naming Simulations” part (see the Functionality page), which is actually what I personally use most often from DrWatson!

Topic		Replies	Views
DrWatson 1.0 - the perfect sidekick to your scientific inquiries is out! Package Announcements package , announcement	0	1705	August 31, 2019
Exploratory research project workflow General Usage	26	2253	January 11, 2021
Data storage/loading for data produced by algorithms and metadata Data	4	1066	August 1, 2019
Workflow tips for small-team academic projectscode New to Julia package , github , workflow , code-organization	13	3087	June 9, 2020
Saving/Loading folders with DrWatson.jl General Usage drwatson	7	577	March 17, 2021

DrWatson - the perfect sidekick to your scientific inquiries!

Related topics