Workflow tips for small-team academic projectscode

@Tamas_Papp Thanks for that git workflow guide. It’s helpful. As for package template generators, my understanding is that both yours and the other one enforce one git repository per package, which makes sense if these packages will eventually be in a registry (although based on LocalRegistry.jl, which @grero recommended, as well as Julia Lang’s own stdlib repo that doesn’t look like a requirement) but not in my case.

DrWatson, recommended by @Skoffer, seems like a really nice and modular tool (as in, it doesn’t come bundled with a “way of thinking”). The default and un-customizable folder structure is a bit restrictive, but you don’t have to use it. From a setup perspective, its main contribution seems to be the @quickactivate macro, which ensures that any script containing it, whatever subfolder it’s in, will run in your project environment, rather than whatever the currently activated environment is. As I understand it, this gives you enforceable reproducibility even for code outside a package, right? By enforceable, I mean that it doesn’t require you to remember to activate a particular environment.

There is actually another feature of DrWatson which makes is really useful for research – the ability to store “results” with metadata about the code version (git commit id) that produced them!

@heliosdrm, your answers are incredible! Thank you for taking all that time to write them up. I think there are a few caveats to what you wrote, but correct me if they don’t apply:

  1. You can’t ]add a package from a local path if it’s not its own git repository. You have to use ]dev.
  2. If you include("MyProjectFunctions.jl") containing a module MyProjectFunctions, you need to preface the module name with a . when using or importing i.e. using .MyProjectFunctions will work but using MyProjectFunctions will not, b/c it will search for the module as a package in the environment’s dependencies in Project.toml.

This raises a related issue when using Revise, which seems like a must in pretty much any Julia workflow. Based on the Revise documentation, one’s choices are either replace all include() statements with includet() or keep all modules in packages. If you do the former, you’re relying on Revise to exist in any environment you ever run your code in, which may be a bad assumption for stuff that may have to run in batch mode on a high-performance cluster. Which basically leaves you with having to create packages for every bit of non-trivial code. And once you create packages, you have to manage their own dependencies/environments in addition to your projects i.e. Revise-based workflow comes at a substantial cost. Or am I missing something here?

Finally, @johnh, if you use Git LFS for storing large data files, do you store the large file on a remotely hosted repo e.g. Github? Do you pay for extra space? It would be nice to have git track large binary metadata while the actual files were stored somewhere else e.g. Dropbox, shared network drive, AWS, university HPC cluster. etc.

2 Likes