Distributed Computing with Slurm and Julia

affans · February 10, 2022, 6:39pm

I also have a very similar setup to you and I don’t use sbatch nor the bash script. I would recommend using ClusterManagers.jl. This is a far easier solution.

Suppose you have a function do_large_computation() that you’d like to parallelize across nodes/cpus. You can setup your script like the following:

using ClusterManagers
addprocs(SlurmManager(500), N=17, topology=:master_worker, exeflags="--project=.")

This adds 500 worker instances over 17 nodes (I have 32 cores per node). You can ignore the topology keyword for now, and exeflags can be used to send command line arguments to each worker instance (in my case, I am activating the current env for each worker instance).

Now you can run your code as if you had done addprocs() locally. So for example, you can do something like

@everywhere include("file.jl") # where file.jl includes your do_large_computation() function
# or 
@everywhere using PkgA # if you'd like to load a package on the worker instances

Then to run the function in a parallel manner, I simply use pmap, i.e.,

pmap(x -> do_large_computation(x), 1:nsims)

which launches and manages your function nsims amount of times over the nodes. The results are all collected in an array and passed back to the head node (or the node from which pmap was executed).

Let me know if you have other questions. It’s also a great exercise to see how ClusterManagers sets up the srun command internally which brings a greater level of underunderstanding.

Topic		Replies	Views
Running Julia in a SLURM Cluster Performance parallel , cluster , distributed	6	7680	April 11, 2024
How to parallel Julia on multiple nodes on HPC (slurm)? Julia at Scale question	11	3590	May 20, 2020
How to run Julia on Cluster? Julia at Scale question , package , cluster	11	5469	March 16, 2021
Distributed computing over SLURM array Performance slurm	11	289	September 20, 2024
I am unable to run a simple distributed.jl code on my slurm cluster Julia at Scale parallel , distributed , slurm	11	644	February 10, 2024

Distributed Computing with Slurm and Julia

Related topics