Running Julia in a SLURM Cluster

I’m looking to provide an explicit explanation of the simplest way to get a Julia program running in parallel across multiple nodes in a SLURM cluster (basically a minimal working example that illustrates the logic).

My impression so far is that there are two primary ways to run Julia in a SLURM cluster. Suppose I want to define a function and run it in a parallel for-loop on N cores distributed across M nodes in the cluster:

Option 1

The first option comes from this Stackoverflow post. Basically, you can use some functions from the ClusterManagers package in your code and then just run Julia as normal without having to explicitly write a SLURM script.

The example program:

# File name
# slurm_example.jl
using Distributed
using ClusterManagers

# Add N workers across M nodes
addprocs_slurm(N, nodes=M, exename="/path/to/julia/bin/julia", rest of SLURM kwargs...)

# Define function
@everywhere function myFunction(args)
    Code goes here...
end

# Run function K times in parallel
@parallel for i=1:K
    myFunction(args)
end

As I understand it, to run this program, I would simply execute

julia slurm_example.jl

from the command line while logged into the cluster. Then the addprocs_slurm function runs the rest of the Julia code as an interactive SLURM job, the equivalent of using srun with the specified SLURM options.

Option 2

The second option, exemplified in this post, involves writing a SLURM script for a batch job calling Julia with the --machinefile flag. In this case, the example program is:

# File name
# slurm_example.jl
using Distributed
using ClusterManagers

# Define function
@everywhere function myFunction(args)
    Code goes here...
end

# Run function K times in parallel
@parallel for i=1:K
    myFunction(args)
end

# Kill the workers
for i in workers()
    rmprocs(i)
end

Then to run this, I would need to write and execute a separate SLURM script that looks something like this:

#!/bin/bash

#SBATCH --ntasks=N    # N cores

#SBATCH --nodes=M     # M nodes 

# Rest of #SBATCH flags go here...

 julia --machinefile=$SLURM_NODEFILE slurm_example.jl

One thing I find confusing about this example is that I don’t understand why I don’t need to do something like

 addprocs(SlurmManager(N))

in the Julia code? Or do I? Are there any glaring errors with this code? Is the main difference between the two options just that Option 1 is an interactive SLURM job and the other a batch job?

Thanks ahead of time for any feedback.

5 Likes

I recently set up some scripts for running Julia jobs on a Slurm cluster. All of my jobs just use a single node with multiple CPUs. I’ll describe my approach, but I’m not an expert in HPC, so I’m not sure if everything that I’m doing is 100% correct.

My approach is to write two scripts: a Slurm script and a Julia script. I currently am not using ClusterManagers. My mental model is that if I request one node with multiple CPUs, Slurm provisions a virtual machine with multiple cores, and Julia will be able to detect those cores automatically just like it does on my laptop. So basically all I need to do is using Distributed; addprocs(4) and then parallelize my code with @distributed or pmap.

Example 1

Slurm Script (“test_distributed.slurm”)

#!/bin/bash

#SBATCH -p <list of partition names>
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --time=00:05:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>

julia test_distributed.jl

Julia Script (“test_distributed.jl”)

using Distributed

# launch worker processes
addprocs(4)

println("Number of processes: ", nprocs())
println("Number of workers: ", nworkers())

# each worker gets its id, process id and hostname
for i in workers()
    id, pid, host = fetch(@spawnat i (myid(), getpid(), gethostname()))
    println(id, " " , pid, " ", host)
end

# remove the workers
for i in workers()
    rmprocs(i)
end

Output File

Number of processes: 5
Number of workers: 4
2 2331013 cn1081
3 2331015 cn1081
4 2331016 cn1081
5 2331017 cn1081

Example 2

In this example I run a parallel for loop with @distributed. The body of the for loop has a 5 minute sleep call. I verified that the loop iterations are in fact running in parallel by recording the run time for the whole job. The run time for this job was 00:05:19, rather than the 00:20:00 run time that would be expected if the code was running serially.

Slurm Script (“test_distributed2.slurm”)

#!/bin/bash

#SBATCH -p <list of partition names>
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --time=00:30:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>

julia test_distributed2.jl

Julia Script (“test_distributed2.jl”)

using Distributed

addprocs(4)

println("Number of processes: ", nprocs())
println("Number of workers: ", nworkers())

@sync @distributed for i in 1:4
    sleep(300)
    id, pid, host = myid(), getpid(), gethostname()
    println(id, " " , pid, " ", host)
end

for i in workers()
    rmprocs(i)
end

Output File

Number of processes: 5
Number of workers: 4
      From worker 2:	2 2334507 cn1081
      From worker 3:	3 2334509 cn1081
      From worker 5:	5 2334511 cn1081
      From worker 4:	4 2334510 cn1081

Comments

If you’re using a Project.toml or Manifest.toml you will probably need to call addprocs like this:

addprocs(4; exeflags="--project")

I also had to jump through some hoops to run a project that had dependencies in private Github repos. I think it boiled down to instantiating a Manifest.toml file that contained the appropriate links to the private Github repos, but I didn’t document the full process…

23 Likes

Awesome, this is super clear. Thanks for taking the time to write it all out. I think this should work in the case of multiple nodes (if one node does not have enough cores, etc.), but even if it doesn’t it’s a step in the right direction, so it deserves to be the chosen solution.

1 Like

Glad that was helpful. Hopefully other folks will chime in with alternative approaches. There’s definitely not a lot of good examples or tutorials on the web for using Julia on a cluster.

My preferred way is something like this:

#!/usr/bin/env sh
#SBATCH -N 10
#SBATCH -n 8
#SBATCH -o %x-%j.out
#=
srun julia $(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
exit
# =#

using MPIClusterManagers
MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

println(workers()) # should have 80 workers here across 10 nodes (controlled by -n and -N above)

You put this in myscript.jl and then sbatch myscript.jl.

This is using A neat Julia/SLURM trick and MPIClusterManagers.jl.

ClusterMangers’s ElasticManager is also quite useful for dynamically hooking up workers to e.g. a Jupyter session, if you prefer the interactive workflow.

7 Likes

I find that doing this is easiest, and seems to work well for me:
Putting this at the top of my script.jl,

#!/usr/bin/env julia

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --time=00:30:00

using Distributed
...

and then just running sbatch script.jl does the job

6 Likes

That’s more simple! I will give a try.