I have found that many of the (not so) documented workflows for using Distributed.jl to distribute proccesses in several nodes of a slurm cluster have several drawbacks, specially for clusters that do not allow ssh connections to the compute nodes. After scratching my head for a while, I manged to find a more or less comapct way to do de parallel calculation in an efficient way that I want to share, for those also looking for these and also asking for comments and improvements.
I have prepared a mwe, but here I explain it in a more detailed fashion.
First, I have noticed that the addprocs_slurm()
function of ClusterManagers.jl fails 50% of the times in some of the clusters I have tried to use it. I don’t know the reason, sometimes it works, sometimes it returns timeouts exceptions from the slurm tasks.
I’ve found that SlurmClusterManager.jl, works fine.
Second, in order for the precompilaton is performed only once (and not once per worker), we need all workers to access the same Julia depot and Manifest.toml. However, it seems that this environment variables cannot be changed from within a julia code, but have to be set before running julia.
The workflow I now follow consists in running bash launcher_cluster.sh
from de cluster access node. This script is the one I edit to set the sbatch
parameters and looks like this:
#!/bin/bash
source prolog.sh
sbatch <<EOT
#!/bin/bash
## Slurm header
#SBATCH --partition=esbirro
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2G
#SBATCH --output="slurm.out/%j.out"
julia --project launcher.jl
EOT
First it runs prolog.sh
:
#!/bin/bash
depot_path="$PWD/.julia_depot"
mkdir -p "$depot_path"
export JULIA_DEPOT_PATH="$depot_path"
export JULIA_PROJECT="$PWD"
julia --project prolog.jl
which creates a directory for the julia depot in the cluster shared filesystem and sets the appropiate environment variables. JULIA_DEPOT_PATH
to point to this depot and JULIA_PROJECT
to point to the directory that contains the Manifest.toml
. As they are set before running julia, they will be visible to all workers we launch. Then this prologue executes a julia script
using Pkg
Pkg.instantiate()
Pkg.resolve()
Pkg.precompile()
to instantiate and precompile, if neccesary. Finally, launcher_cluster.sh
runs launcher.jl
, which actually opens the parallel workers and runs the code. It looks something like this
#!/usr/bin/env -S julia --project
## Julia setup
script_path = ENV["SLURM_SUBMIT_DIR"]
using Distributed, SlurmClusterManager
addprocs(SlurmManager())
## Run code
include("$(script_path)/src/main.jl")
## Clean up
rmprocs(workers()...)
All the actual code is in src/main.jl
, where I can run all using
and inlcude
,even with relative paths, as if I was writing code for a local distributed calculation.
I have tested it in several slurm clusters and it works pretty well. The workers launch time has reduced significantly in comparison to instantiating and precompiling in each of them and I have never runed into a timeout exception.
What do you think about this workflow? Do you see any possible improvements?