Distributed Computing with Slurm and Julia

I’m looking for some guidance on how to get started with the following scenario. I have a cluster with multiple nodes, each node has 48 cores. The job scheduling is done with slurm. Currently, I’ve been running single node distributed jobs that are of the form (computation1.jl):

using Distributed
using SharedArrays
@everywhere include("setup1.jl"); # preprocessing and setup code
results = SharedArray{Float64}(n_samples);
@sync @distributed for i in 1:n_samples
    results[i] = f(i); # f is defined in setup1.jl
end
# save to disk

Which is to say, this is is a perfectly parallelizable job with no interaction between each iterate.

This is then launched with slurm using the command sbatch job1.sh, where this job script looks like:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=48
#SBATCH --mem=128GB
#SBATCH --cpus-per-task=1
...
julia -p 48 computation1.jl

This works as expected as a single node 48 process job.

But I would like to move to run with multiple nodes, to leverage the resources I have for larger jobs. I’ve looked a bit at ClusterManagers.jl and the Julia documentation, but I’m struggling to see how to modify both my Julia code and my slurm script to run properly. Thanks for any help.

Looking at this, I’m still a bit unsure about a few things. Looking at the bottom, I see what appear to be scripts:

#!/bin/bash
#PBS -l nodes=4:ppn=12,walltime=00:05:00
#PBS -N test_julia
#PBS -q debug

julia --machine-file=$PBS_NODEFILE main.jl

which I suspect would then be submitted to a queue. Is that correct? If so, what would be the counterpart to $PBS_NODEFILE for the slurm environment?

Maybe you can search for SLURM env variables:

https://hpcc.umd.edu/hpcc/help/slurmenv.html

PRs are welcome in that repository with a SLURM example.

Correct. You will submit the job script as usual with SLURM, probably with sbatch.

From the other link I shared I think you are after the variable $SLURM_JOB_NODELIST

I also have a very similar setup to you and I don’t use sbatch nor the bash script. I would recommend using ClusterManagers.jl. This is a far easier solution.

Suppose you have a function do_large_computation() that you’d like to parallelize across nodes/cpus. You can setup your script like the following:

using ClusterManagers
addprocs(SlurmManager(500), N=17, topology=:master_worker, exeflags="--project=.")

This adds 500 worker instances over 17 nodes (I have 32 cores per node). You can ignore the topology keyword for now, and exeflags can be used to send command line arguments to each worker instance (in my case, I am activating the current env for each worker instance).

Now you can run your code as if you had done addprocs() locally. So for example, you can do something like

@everywhere include("file.jl") # where file.jl includes your do_large_computation() function
# or 
@everywhere using PkgA # if you'd like to load a package on the worker instances

Then to run the function in a parallel manner, I simply use pmap, i.e.,

pmap(x -> do_large_computation(x), 1:nsims)

which launches and manages your function nsims amount of times over the nodes. The results are all collected in an array and passed back to the head node (or the node from which pmap was executed).

Let me know if you have other questions. It’s also a great exercise to see how ClusterManagers sets up the srun command internally which brings a greater level of underunderstanding.

This provides an alternative to ClusterManagers (which you can certainly use) and works for slurm with sbatch.

Just a note, SharedArrays cannot be used between nodes, they rely on local memory (one node). Look at DistributedArrays.jl for alternatives.

Looking at that script, my cluster requires us to load modules (i.e., Julia is not available by default). Usually this is done in the job script; how do would incorporated that?

How do you launch this job, though? My cluster is not supposed to be used interactively.

Looking at that script, my cluster requires us to load modules (i.e., Julia is not available by default). Usually this is done in the job script; how do would incorporated that?

Usually you do that outside julia. When you login to your cluster through, load any modules you need right away (or better yet, just load them in your .bashrc file so the modules are always loaded as soon as you ssh in) and then start the julia process.

How do you launch this job, though? My cluster is not supposed to be used interactively.

This depends on the level of interactivity. My preferred method is to use a multiplexer like screen or tmux. This allows an interactive Julia REPL to run even if you logout of the ssh … and also allows you to monitor your status by printing debug/info status right to the REPL (I infact use a progress bar).

I think screen should be available by default in a lot of cluster configurations. My workflow is as the follows:

  1. ssh into the cluster and load any modules required (I need to load Slurm and Julia on our cluster).
  2. start a screen session.
  3. start a julia session (takes me to Julia REPL).
  4. Use ClusterManagers and pmap to start my simulations.
  5. Disconnect from screen by ctrl -a -d (this is called detaching).
  6. Exit out of the ssh connection

You can periodically check your status by logging in and reattaching to your screen session by typing in screen -r.

A similar workflow can be had with tmux. If you really want a complete non-interactive solution, then perhaps try a combination of ClusterManagers and sbatch.