Julia workflow for research

I currently have a position as researcher, where I mainly write software that is being used just by myself.

So what is a good workflow for doing this?

I mainly:

  • create models
  • run simulations
  • plot the simulation results

I found out that I do not need modules for that. I have the following source code structure (simplified):

init.jl   # has all the using commands and loads the settings.yaml file
model1.jl
model2.jl
model3.jl
plot1.jl
plot2.jl
plot2.jl

tests.jl
main.jl

I have a script to start julia with the required parameters that automatically includes init.jl and then opens a Julia prompt and prints a list of the available tests.

And in main.jl I include everything else.

After any change I run:
include("src/main.jl"); test1() # or any other test

I don’t need Revise for that.

Advantage: Visual studio code works very well, “Go to Definition” etc always works, no warnings. Using custom modules for my code always caused issues with vscode.

And navigating the code is also easy, because each of the files is pretty small and it is easy for me to find the right file (well, I do not use names like model1.jl but speaking names like model_closed_loop.jl etc)

Disadvantage compared to using modules and Revise: I have to run include("src/main.jl") after any source code change. But that is very fast for me, like 0.11 s.

I use a custom system image with all my main modules to keep the startup time below 5s. I need this because I use ModellingToolkit and ControlSystems which are still quite slow to load. Perhaps this will not longer be needed with Julia 1.10.

Any comments?

2 Likes

I think there is nothing wrong with your workflow. It is not necessary to make everything into a module. I think a module (or package) becomes sensible if you find that you have standard problems which you tend to solve in more or less the same way.
At some point, it becomes annoying to copy/paste functions from previous projects, so it makes sense to re-use code.
If you do not encounter this, you probably do not need to change your habit.

It seems there is no big problem for you, but if you want to save 0.11 s, you can still use Revise.includet("src/main.jl"), once at the beginning.
If my problem is similarly structured as what you describe, I personally like to have a julia file, that I use as a notebook, i.e. separate code cells in the style

using A,B,C
include("main.jl")
##
plot1(params)
##
println(data)
##
let #to avoid pollution of my global namespace
    a = 1
    b = 2
    x = DoSomeThing(a,b)
    plot2(x)
end
##
...

That being said, working this way, I often find that I need more code reuse in future projects, so I have developed a bunch of private packages for evaluation that help me avoid writing lots of similar code.

You haven’t mentioned it but I assume that you are including the Manifest.toml to version control. I think for this workflow this is the most important part because you can, at any point, go right back into reproducing your plots the same way they were.

1 Like

No, not directly. I create a renamed version, e.g. Manifest-1.9.toml now and then, e.g. when I finished a paper and commit it, but on a daily base I only put Project.toml into the git repository. I try to have a complete [compat] section in Project.toml, though. I use the code on different computers, and then you would get merge conflicts with Manifest.toml.

I told my prof if he pulls from git for testing my code he shall also run Pkg.resolve() which works fine, and it would probably not work if Manifest.toml would be in git.

If you always commit the changes to the manuscript and pull regularly, then there should be no problem (Julia will ask you to run ] instantiate. Of course if you change it on two different computers and then push both, there will be merge conflicts, but in the worst case you should be able to resolve those by deleting the Manuscript and using ] instantiate again.
I think it is really best to keep the manuscript in version control, before I used to do that it was extremely annoying to go back to older projects if anything needed to be redone.

Well, for me a Project.toml with good compat bounds (with ~ in front) works better… If I have a merge conflict, e.g. because two people or I myself from two different computers added two different packages I can resolve it manually. But I guess that is a matter of taste…

1 Like

There’s nothing wrong with your workflow, but you might also be interested in DrWatson for something more structured. Even if some of its details turn out to be too opinionated for your specific use case, there are many good ideas in that package.

2 Likes

If your workflow works for you, that’s all that matters.

That said, my personal preference is to

  1. Put all code in modules, which are packages. This does not only give me the ability to use Revise.jl, but also write and run unit tests. I found these invaluable: I test my code anyway, so all this amounts to is actually saving the tests in an organized way. For bigger projects, I have a “data” and a “results” git repository too, simulations run too long so it is important to preserve results and be able to compare them to earlier ones.

  2. I factor out functionality that I may reuse to private packages. Some of these are public but unregistered, but if they end up as something coherent and useful I clean up and register them. This prevents a lot of silly copy-pasting between projects.

  3. I second @goerz’s suggestion for DrWatson.jl. It is non-intrusive and contains modular functionality for a lot of use cases. You just use what you need.

  4. I usually commit the Manifest.toml only for the actual package that was written for the research project.

  5. Despite best efforts, projects that run for years with multiple coauthors inevitably become an unholy mess. I make an effort to clean it up from time to time. This has helped me catch bugs, or optimize the code.

9 Likes

I’m unclear what Revise and modules have to do with each other. Use Revise.includet to include your scripts.

3 Likes

For research I tend to start with code in a script in global scope and “graduate” the code when it’s worth it :

Global scope (with ## blocks) → function → function in a separate file → private package → registered package.