Deployment at scale

Emmanuel-R8 · May 1, 2020, 9:11am

Earlier this week I made a presentation of Julia for data science and ML. One question came up to which I could only give a non-committal answer. Paraphrasing, once our notebooks work fine, how do we deploy at scale on various infrastructure solution.

I did a bit of research and came up with:

Out of the box, without additional packages, julia can be executed on several worker processes on a single machine or several machines connected through ssh.
GitHub - staticfloat/julia-docker: Various Dockerfiles for Julia has a number of of docker files.
There is a community focused of running Julia on parallel workloads (JuliaParallel · GitHub). In particular, they oversee MPI wrappers and another package to deploy on many standard batch systems (e.g. Slurm).
Dokku (Dokku - The smallest PaaS implementation you've ever seen) and Elasticluster (https://elasticluster.readthedocs.io/en/) have examples for Julia
The most prominent Julia consultancy company (Julia Computing) has a proprietary solution call JuliaRun.

However, I have not found a central repository of knowledge, success story or clear howtos / workflows for handling that.

Are you aware of anything? Can anybody comment on how mature those options are for a production, high quality environment (think paranoid financial trading requirements)?

Thanks.

johnh · May 1, 2020, 12:03pm

This is a very good discussion.
The ‘classic’ HPC answer is to us ea job scheduler such as Slurm.

I guess I would ask them what their current method of deploying Docker containers is.
The answer is likely to be Kubernetes.

Personally I rather like Singularity containers - which are inherently secure and you can ‘read in’ a Docker container

I rather like the concepts of Nomad also, and Singularity fits with it

johnh · May 1, 2020, 12:08pm

If you are working in the deep learning field there are several frameworks for deploying models - I guess most assume Kubernetes.
For example Seldon Tech Ethics Meetup: AI, Data and Ethics with Prof. Joanna Bryson - Seldon
Hopsworks https://www.logicalclocks.com/

I am unaware of Seldon/Hopsworks being used with Julia

johnh · May 1, 2020, 12:09pm

Pushign Singularity again, it has signed containers, which may be important in security onscious environments
https://sylabs.io/guides/3.5/user-guide/signNverify.html

ffevotte · May 1, 2020, 6:57pm

Julia Computing recently proposed a webinar on “Building Production Applications Using Julia”, led by @avik and which covered some aspects related to deployment. Hopefully a recorded version will soon be available online.

ffevotte · May 1, 2020, 7:15pm

I’ve recently had the occasion/opportunity to help a company put a Julia code into production, and we found the tooling we used to be very mature. Nothing really fancy, but things that work:

a fresh docker image is built
the Julia application is installed in the docker image (simply using Pkg)
…and compiled there using PackageCompiler
tests are run inside the docker image to check that everything works (Pkg again)
all this workflow is triggered in the CI/CD system
the docker images can then be deployed either on local resources or on cloud-computing platforms

As you can see, the entire workflow was built upon Pkg and PackageCompiler (v1) which we found to work very reliably.

Emmanuel-R8 · May 2, 2020, 6:32am

Thanks.

In other posts, many have pointed out that the initial compile time on spinning out a docker image was an issue. Does PackageCompiler address that completely? What about snapshotted VM image ready to go?

ffevotte · May 2, 2020, 2:45pm

In my experience, with a good precompile_execution_file (not always easy to provide), the time needed to spawn a new Julia process and load all packages in the environment is reduced to something like 1s (max). The first run of every function might sometimes still be a bit slower than usual if additional compilation is required, for functions which could not be captured in the system image. But at least that mostly eliminates the latency problem.

I guess everything depends on the use case, and especially the expected run time of the Julia process.

That might very well be a good idea, but I never tried…

onetonfoot · May 2, 2020, 4:02pm

If you choose the docker route then SimpleContainerGenerator.jl could be useful. I’m yet to try it out but looks like it will automate a lot of the boilerplate

Topic		Replies	Views
How to set up / use a Kubernetes-cluster for distributed-computing? Julia at Scale cluster , distributed , kubernetes	2	1741	February 17, 2022
What is the use case for Julia docker images? Tooling question , docker	10	5280	May 9, 2020
Julia cluster using docker General Usage question	3	1642	October 21, 2018
About the Julia at Scale category Julia at Scale	0	1754	July 24, 2017
Run a julia application at large scale (on thousands of nodes) Julia at Scale question	8	2990	August 10, 2020

Deployment at scale

Related topics