I’ve been using Julia on clusters, and here are some of my observations / recommendations.
(1) I often encounter old Julia installations on clusters, so I install the latest stable version myself.
(2) If you install julia via juliaup, it adds HOME/.juliaup/bin/julia
to the PATH
environment variable to your login shell by modifying e.g. $HOME/.bashrc
. However, when you run a job through the cluster’s queuing system, the above environtment variable will not be set, and you need to manually add the above path to your job script, or alternatively directly use the julia binary executable, which juliaup places at e.g.
$HOME/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/bin/julia
Alternatively you could directly download Julia’s binary tarballs without juliaup.
(3) Starting from version 1.9. Julia caches compiled binaries, which can cause crashes on clusters where different nodes have different (variations of) CPU architectures. So I always set the environment variable
export JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"
before running Julia, to make sure that the cached binaries are compatible with all the nodes that potentially run my job. The above line needs to be customized if you care about specifying the actual CPU architecture on your cluster to maximize performance. (See documentation here.)
(4) Some clusters have tight resource limits for login nodes, so I need to set the environment variable
export OMP_NUM_THREADS=1
to be able to launch julia successfully on login nodes. Additionally, the resource limits of login nodes can cause procompilation to fail when I add a package via Pkg
, so I sometimes request an interactive compute job just to run the Pkg
operations.
(5) If you want to set up a Julia installation for other users, it’s best to just provide Julia itself but not packages, though there are ways to provide a centralized package depot by using the environment variable JULIA_DEPOT_PATH
. (The details are a little bit tricky and discussed elsewhere.)
(6) Set julia’s --heap-size-hint
command line option, available since v1.9, to the amount of memory you requested in the queuing system. Otherwise Julia may think it has access to all the RAM of a node and not run GC aggressively enough.
P.S. I also hope that some kind of “official” guide is available, as I stumbled upon these issues in bits and pieces before finding out the solutions.