This is is just amazing! I started to work on the exact same idea but ended up giving up!
An additional feature I had in mind was to have a GUI (based on Electron and Interact.jl) to prepare the simulation runs. This would include create/edit a template, and save/load specific config files.
I would be happy to contribute in any case!
I don’t fully understand your suggestion, so I think it is best to open a feature request issue to explain it in detail! I do want to comment though that Electron+Interact are quite heavy dependencies and also not only Julia dependencies which is something one should always think twice before adding. But of course it could be worth it.
You tackle stashing files that don’t permit metadata in their format: Generate a filename that encodes parameter values. That kind of scheme has a second part: One needs to parse back the filename into its parameter values.
When I do this kind of thing in a project, then it is very annoying to generate a parameterset->filename mapping, plus regex to reconstruct the parameterset from the filename, in a way that is still human readable and does not lead to extremely long names. This is a giant ugly kludge, and kudos for trying to deal with it for us.
While I saw that you tackle name generation, I did not see any mention of parsing of names in the docs. Is that supported?
Thank you very much for your kind words. Parsing of names is not yet implemented, however I always thought about it. I believe this is a functionality we should have and that it is also easy to implement.
We just didn’t have the manpower to do it until this time. I’ve opened up an issue that summarizes the process: https://github.com/JuliaDynamics/DrWatson.jl/issues/38 contributions would be super welcome, otherwise I will do it as time permits!
Interestingly , this is something I tried tackle, though inelegantly, in my DataProcessingHierarchyTools package. I basically needed a way to navigate a directory structure with the directory names defining a certain level of analysis. In my particular case, I am analysing neural data, and I have some analysis that run on an entire session, some on arrays of recording channels for that session, and some on individual cells. This tool allows me to automatically navigate to the appropriate level by defining a level parameter attached to each analysis type.
What I ended up doing for parameters was simply to attach a hash of those parameters to the file name, so that when I run analysis with identical arguments, the results are simply loaded. Of course, this means that I can’t tell what the arguments were simply by looking at the filename. Anyway, DrWatson seems to be much more polished version of this, and as I said before, I’ll try to integrate that into my workflow.
DVC works quite nicely for me for versioning large files using S3 as the storage. It’s clearly focused on predictive modelling workflows but the versioning system is generic so it could fit different use cases.
Well the idea would be to first have a template config file for a project.
Then one could create specific config files for each experiment that would directly be fed to the workspace to run an experiment, similarly to your dict_list function.
From my experience I find it easier and less error-prone to work in a GUI to select parameters instead of manipulating dictionaries directly. It also allows to have the config file saved in the results folder as well.
And the field are not directly julia variables. They would simply be fields names.
I see, but to really understand this, I would still need a usage demonstration or at least explanation. What do you do with the config file? How do you use it? What is the config file? is it XML, Julia, Toml? What’s its type? How do you actually use it in a simulation? Also, don’t you need to write a special parser for this to work?
For the dictionary all these questions are immediately answered since it is a basic Julia structure.
Yes this is a valid point, but one should consider that you may need to do these things over a cluster, or a cloud, or any other connection that won’t be able to support this. This is an advantage of the dictionary approach. A second advantage is that it works consistently with any conceivable type, existing or not (due to how we handle Vector subtypes). A final point is simply that Electron is a very heavy dependency.
Please notice: I am not bashing you or anything. From personal experience, the best way to improve something is to be as critical as you can, which is what I do here.
Thanks for the helpful feedback. I did not think the whole thing through but my pipeline idea would be :
Create Template Config File → Save as JSON (contains field name, default value and limits/options)
Create Config File → Open existing Template file → Set values → Save as a Dict (where field names are keys) in JSON.
The config file can then be directly fed by being read as a Dict.
In short it is simply a practical (?) config file GUI maker
I advanced a bit more to make things maybe more clear
For the code it’s pretty ugly so far but you can still check it out if you want : https://github.com/theogf/MLExps.jl
As you see I was going in the same direction as you did. You can simply check src/gui_config.jl and test/test_gui.jl
DrWatson has different goals: version control, sharing, project file structure, and provenance (keeping track of simulation settings). In these respects, it is more like MLFlow, especially when it comes to tracking. drake, on the other hand, tries to be Make for R. drake synchronizes expensive computations in an end-to-end pipeline so repeated full runs take minimal time. drake analyses your targets and functions to figure out what needs to run and what can be skipped, taking into account that some computational steps depend on others,.
I will say that DrWatson and drake are similar in that they both abstract away output file management and reduce manual bookkeeping.
No, from the 6-minute video I just watched they don’t seem similar. DrWatson also seems to have a much simpler and cleaner approach to helping you (e.g. you don’t have to make a “DataFrame” out of every function in your code!)
I disagree with that characterization of the data frame (the drake plan). You do not need to wrap up all your functions in it. In drake, you can define your supporting functions wherever and however you want, and then your commands in the plan simply reference those functions as needed. Most of your code lives in the functions, as is the case for cleanly-implemented scientific workflows in general, drake or no drake.
The drake plan is like a Makefile for R. The main differences are
You are writing R code.
The syntax is much easier than Make wildcards.
You do not need to list out all the dependencies of each target manually by hand. drake automatically discovers dependency relationships among your targets and functions using static code analysis. (See the graph above.)
Hi, @wlandau, thanks for chiming in. Welcome to the Julia discourse! Glad that this post somehow made its way to you so we can get a more fair representation of the other software mentioned here!
Okay, I can accept this. I didn’t spend a lot of time to learn drake so I may not be accurate in my judgement. But be aware though, that the workflow dependency graph that you showcase in your original comment is already too complicated for me as a working scientist and also compared to what DrWatson needs to achieve. I think we just have different goals and/or target groups.
Correct but, lets not forget the “Naming Simulations” part (see the Functionality page), which is actually what I personally use most often from DrWatson!