How do you organise Julia workloads that might take several hours to run and have a number of intermediate processing steps and data products?
I have a data processing tool chain built with Julia shebang scripts and orchestrated with GNU make.
It works, but as the work load is growing, the Julia start up overhead is becoming more significant.
I wonder whether there is a way to precompile shebang scripts, but this doesn’t feel like the “Julian way”.
I have found https://github.com/nolta/Jake.jl, https://github.com/kshramt/Juke.jl and https://github.com/tshort/Maker.jl but I’d be interested to hear what other people are using in practice please.
I’m very interested in hearing other’s solution to this problem as well.
At the moment I also use a bunch of poorly organized Julia shebang scripts (lumping them into a Makefile is a brilliant idea actually). In my case the start-up overhead is not an issue because each step is at least 10 minutes, and I’m constantly relaunching Julia with different numbers of works spread across different machines.
Of the packages you linked, only
Juke.jl seems to be maintained, the rest looks like abandonware.
make, for the following reason: I fiddle around with my Julia installation (incl the packages) too much to provide a stable environment for days. If things break and come to a stop, then this setup ensures that it happened because the governed processes failed, not the governor. Also, since the state is effectively the filesystem for
make, it is very robust.
The most efficient alternative would be to write functions and modules, not scripts, and call them from within a single Julia program. Pass data around in-memory rather than piping through files.
Thanks but I’m inclined to agree with Tamas - my data is large (larger than my physical memory) and takes hours to process. If the process get’s killed 7 hours in to a 10 hour run, I don’t want to have to restart from scratch.
If your processing runs for hours but the Julia start-up time is significant, then the granularity of your Julia scripts is way too small. Write scripts that do more, using functions as the units of granularity, not scripts. You can still use files and scripts at higher levels, for checkpointing and out-of-core processing.