Distributing parallel tasks/workers over multiple nodes using SLURM

I’m trying to compute a series of Monte Carlo (MC) integrations on an HPC using SLURM. Since the MC samples are all independent, my approach is to distribute them as parallel tasks using pmap().
Each MC sample is quite memory-intensive, so I want to allocate each task to its own CPU (with 4GB of RAM). Each computing node has 16 CPUs, and I am aiming at 256 MC samples, so I need 16 nodes.

According to cluster support, something is not working as intended, and all of the tasks are somehow only being stuffed onto the first node, which causes them to have way too little RAM and fail.

Unfortunately, I have no idea about how to configure SLURM tasks/workers within Julia. I got a few lines of Julia code from my former supervisor and just pasted them:

using ClusterManagers

const SLURM = true

N_cpus = try
    parse(Int, ARGS[1])
catch
    3
end

# 1 main process and $N_worker sub processes
# on the cluster add all processes
SLURM ? addprocs(SlurmManager(N_cpus)) : addprocs(N_cpus - 1)

...

pmap(...)

In the SLURM submit file, I specified my nodes and tasks:

#SBATCH --nodes=16
#SBATCH --ntasks-per-node=16

...

echo "------------------------------------------------------------"
echo "SLURM JOB ID: $SLURM_JOBID"
echo "$SLURM_NTASKS tasks"
echo "------------------------------------------------------------"

module load julia/1.5.3
julia --project=@. julia_file.jl $SLURM_NTASKS

Can someone spot my mistake?

Try

addprocs(SlurmManager(256), N=16) 

Instead of this whole line?
SLURM ? addprocs(SlurmManager(N_cpus)) : addprocs(N_cpus - 1)

Or like this?
SLURM ? addprocs(SlurmManager(256), N=16)

The latter. I was just trying to highlight the N argument in addprocs. It tells Slurm how many nodes to use. This should set up 16 workers over 16 nodes.

Offtopic, I will also mention that pmap runs on the headnode and all the results from the computations are passed back to the headnode. Often the headnode has very little memory relative to the compute nodes. For example, my headnode only has 32gb of memory compared to 256gb on each of my nodes. So just keep a track of your memory usage.

1 Like

The results I return to the head node shouldn’t be significant in size. But thanks for the info!

I think I’m misunderstanding something about the syntax:
SLURM ? addprocs(SlurmManager(256), N=16)
gives me an error:
LoadError: syntax: colon expected in "?" expression
There is a colon in the original line, so I guess I’m missing the last part
: addprocs(256 - 1) or something like that?

There must be a way of dealing with the pmap being run on the head node in Slurm.
Perhaps we need to start a job on a compute node which then uses SlurmManager to add the workers.

Not sure what you mean here.

SLURM ? addprocs(SlurmManager(N_cpus), N=16) : addprocs(N_cpus - 1)

should work fine.

I was just confused because I mistakenly thought I should leave out the last part of the command
: addprocs(N_cpus - 1) which was there originally.

Anyway, I seem to have found a simpler solution:
There is a package SlurmClusterManager.jl which handles all of the CPU and node allocations automatically. That way, I only need to include the lines

using Distributed, SlurmClusterManager
addprocs(SlurmManager())

in my Julia script. That way, I only need to edit the .sh submit file if I want to allocate the tasks differently.
But thanks for your help nonetheless!

There is a difference between the two packages. The documentation for SlurmClusterManager says

Requires that SlurmManager be created inside a Slurm allocation created by sbatch/salloc. Specifically SLURM_JOBID and SLURM_NTASKS must be defined in order to construct SlurmManager . This matches typical HPC workflows where resources are requested using sbatch and then used by the application code. In contrast ClusterManagers.jl will dynamically request resources when run outside of an existing Slurm allocation.

In other words, you can either allocate nodes outside julia using sbatch/srun and manage workers externally or you can allocate everything inside Julia using ClusterManagers. I like the latter better since it keeps my code clean and concise and without introducing script files. The N argument in addprocs(SlurmManager(N_cpus), N=16) precisely tells the script how many nodes to allocate so it should work, which then calls srun to dynamically allocate nodes. No .sh files needed.

However, if you are more comfortable with manually running sbatch/srun, then yes use the SlurmClusterManager

1 Like

As I understand it, it is common practice to use a .sh file on our cluster since there are a lot of parameters (QOS, time limits, notifications) to define when using sbatch.
I don’t know if it is obligatory, but it doesn’t bother me, so I’m gonna stick with it :slight_smile:

If you deal with the problem of multiple-nodes, can you tell me how do you do it? Especially in your julia code, i want to know the detail of the code. Thanks!