Earlier this week I made a presentation of Julia for data science and ML. One question came up to which I could only give a non-committal answer. Paraphrasing, once our notebooks work fine, how do we deploy at scale on various infrastructure solution.
I did a bit of research and came up with:
Out of the box, without additional packages, julia can be executed on several worker processes on a single machine or several machines connected through ssh.
There is a community focused of running Julia on parallel workloads (JuliaParallel · GitHub). In particular, they oversee MPI wrappers and another package to deploy on many standard batch systems (e.g. Slurm).
The most prominent Julia consultancy company (Julia Computing) has a proprietary solution call JuliaRun.
However, I have not found a central repository of knowledge, success story or clear howtos / workflows for handling that.
Are you aware of anything? Can anybody comment on how mature those options are for a production, high quality environment (think paranoid financial trading requirements)?
I’ve recently had the occasion/opportunity to help a company put a Julia code into production, and we found the tooling we used to be very mature. Nothing really fancy, but things that work:
a fresh docker image is built
the Julia application is installed in the docker image (simply using Pkg)
…and compiled there using PackageCompiler
tests are run inside the docker image to check that everything works (Pkg again)
all this workflow is triggered in the CI/CD system
the docker images can then be deployed either on local resources or on cloud-computing platforms
As you can see, the entire workflow was built upon Pkg and PackageCompiler (v1) which we found to work very reliably.
In other posts, many have pointed out that the initial compile time on spinning out a docker image was an issue. Does PackageCompiler address that completely? What about snapshotted VM image ready to go?
In my experience, with a good precompile_execution_file (not always easy to provide), the time needed to spawn a new Julia process and load all packages in the environment is reduced to something like 1s (max). The first run of every function might sometimes still be a bit slower than usual if additional compilation is required, for functions which could not be captured in the system image. But at least that mostly eliminates the latency problem.
I guess everything depends on the use case, and especially the expected run time of the Julia process.
That might very well be a good idea, but I never tried…
If you choose the docker route then SimpleContainerGenerator.jl could be useful. I’m yet to try it out but looks like it will automate a lot of the boilerplate