How does one set up a centralized Julia installation?

Here is what I contributed to the julia-in-production BoF channel at JuliaCon 2020:


Hi all,

I work as a computational scientist at the Swiss National Supercomputing Centre (CSCS) and am the responsible for Julia computing at CSCS. We have put Julia in September last year in production on the Piz Daint GPU supercomputer which hosts 5704 GPUs [0] (2017/2018 Piz Daint was during 1 year listed as number 3 of the world on top500.org). In August, we will also make it available on the CSCS jupyterhub service. BTW, we have also a page on the CSCS user portal dedicated to Julia [1].

I will try to summarize a little bit our approach in providing Julia on the supercomputer and point out difficulties and things that might could be improved on the Julia side, and in particular on the package manager side.

At CSCS, all the scientific software stack is built with the Python-based ‘EasyBuild’ [2]. It is important to know that Piz Daint contains a GPU partition (‘gpu’) and an multicore partition (‘mc’). So we created an EasyBuild recipe for two modules ‘Julia’ and ‘JuliaExtensions’ for each partition, ‘gpu’ and ‘mc’ (as it is done for all the scientific software stack). So the modules contain the following:

  • Julia-mc [3]: Julia + MPI.jl
  • Julia-gpu [4]: Julia + MPI.jl + CUDA.jl
  • JuliaExtensions [5,6]: some additional packages like Plots.jl etc.

The modules set up a stacked environment. The easyconfig files [3-6] are very high-level and declarative, the magic happens in the easyblock files [7-9]. The easyblock files are quite difficult to read if one is not used to working with these files. I will try to summarize the most important. E.g. for the Julia-gpu module, it sets the following environment variable:

setenv		 JULIA_CUDA_USE_BINARYBUILDER false
setenv		 JULIA_PROJECT ~/.julia/1.4.2/daint-gpu/environments/1.4.2-daint-gpu
setenv       JULIA_DEPOT_PATH ~/.julia/1.4.2/daint-gpu:<path-CUDA-MPI>:<path-Julia>
setenv       EBJULIA_USER_DEPOT_PATH ~/.julia/1.4.2/daint-gpu 
setenv		 EBJULIA_ADMIN_DEPOT_PATH <path-CUDA-MPI>:<path-Julia>
setenv		 JULIA_LOAD_PATH @:@#.#.#-daint-gpu:<path-CUDA-MPI>:<path-Julia>/environments/1.4.2-daint-gpu:@stdlib
setenv		 EBJULIA_USER_LOAD_PATH @:@#.#.#-daint-gpu
setenv		 EBJULIA_ADMIN_LOAD_PATH <path-CUDA-MPI>/environments/1.4.2-daint-gpu:@stdlib

where we have:

<path-CUDA-MPI>: path to CUDA.jl and MPI.jl
<path-Julia>: path to Julia base installation

Then, then EasyBuild installation also creates a Julia ‘startup.jl’ file which fixes the DEPOT_PATH and the LOAD_PATH to get (there is a little bit more coding in there):

DEPOT_PATH .= [USER_DEPOT_PATH; ADMIN_DEPOT_PATH]
LOAD_PATH .= [USER_LOAD_PATH; ADMIN_LOAD_PATH]

The result for the user is the following:

  1. when loading Julia[Extensions], Julia and the additional packages are immediately usable; precompiled files go into the users home, e.g. ~/.julia/1.4.2/daint-gpu for GPU.

  2. when a user installs a package it goes by default in his home, e.g. ~/.julia/1.4.2/daint-gpu for GPU.

  3. a user can install a different version of the provided package and it will automatically take his installation as the USER_DEPOT_PATH and the USER_LOAD_PATH have precedence.

Now, let’s talk about what could maybe be improved from my point of view. They main things that come to my mind are:

  1. It is not very fortunate to need to do a reshuffling of the DEPOT_PATH and the LOAD_PATH in startup.jl. This breaks for example the unit tests from certain packages, in particular when called directly from the package manager. It would be good if the Pkg manager would allow to set ADMIN_DEPOT_PATH, USER_DEPOT_PATH, ADMIN_LOAD_PATH and USER_LOAD_PATH and create from this the DEPOT_PATH and LOAD_PATH.

  2. Quering the available packages in the stacked environment is not obvious. Pkg manager could improve this.

  3. Then, this works all pretty well for small or medium jobs, but for large jobs with thousands of GPUS it is not good that each process needs to read from the same files (stacked environment / home). For large-scale deployment, it would be nice if one could create a binary / folder that can be copied at the beginning of a job to the compute nodes ram-disk (/tmp) and then each compute node just reads from there during the job. I have successfully run with up to thousand GPUs using such an approach in a “manual way”. It would be great if e.g. PackageCompiler.jl could help with that. BTW: a discussion that I initiated on this topic on Julia discourse a year ago did not lead to a solution then [10].

We have the intention to share our EasyBuild recipies on the official EasyBuild repository. To this, we are however still missing some work: building Julia from source, making parts of it maybe more generic and do some cleanup.

@Fredrik Ekre and @Kristoffer Carlsson, could you comment on the points related to the Pkg manager and PackageCompiler.jl?

Looking forward to a great and productive discussion with all of you!

Thanks!!

Sam

REFERENCES

[0] Piz Daint | CSCS
[1] Julia
[2] https://github.com/easybuilders/easybuild
[3] https://github.com/eth-cscs/production/blob/master/easybuild/easyconfigs/j/Julia/Julia-1.4.2-CrayGNU-19.10.eb
[4] https://github.com/eth-cscs/production/blob/master/easybuild/easyconfigs/j/Julia/Julia-1.4.2-CrayGNU-19.10-cuda-10.1.eb
[5] https://github.com/eth-cscs/production/blob/master/easybuild/easyconfigs/j/JuliaExtensions/JuliaExtensions-1.4.2-CrayGNU-19.10.eb
[6] https://github.com/eth-cscs/production/blob/master/easybuild/easyconfigs/j/JuliaExtensions/JuliaExtensions-1.4.2-CrayGNU-19.10-cuda-10.1.eb
[7] https://github.com/eth-cscs/production/blob/master/easybuild/easyblocks/julia.py
[8] https://github.com/eth-cscs/production/blob/master/easybuild/easyblocks/juliabundle.py
[9] https://github.com/eth-cscs/production/commit/effe98049685427ed6bf7a6e5431bd811a85e8a0
[10] Run a julia application at large scale (on thousands of nodes) - #6 by johnh

9 Likes