SLURM manager: one node with multiple tasks

Hi,

I have access to a single node with many 24 cores. I want Julia to treat that node as 4 workers with 6 cores per each worker.

My attempt is:

using Distributed, ClusterManagers

addprocs(SlurmManager(1), nodes=1, ntasks_per_node=4, cpus_per_task=6)

@info "STARTED"

hosts = []
pids = []
for i in workers()
	host, pid = fetch(@spawnat i (gethostname(), getpid()))
    @info "host: $host"
    @info "pid: $pid"
	push!(hosts, host)
	push!(pids, pid)
end

# The Slurm resource allocation is released when all the workers have
# exited
for i in workers()
	rmprocs(i)
end

but I can see that only one worker is running.

Maybe I do something wrong?

1 Like

I would think you need to submit four separate,6 core jobs.
Or one 24 core job.

If I am not understanding the aim please make it a bit more clear what you want to achieve

1 Like

Hi,

Thank you for response.

After several days of playng with Distributed I have found that there is at least two options to work with SLURM:

  1. ClusterManagers.jl package that implements SRUN
  2. SlurmClusterManager.jl package that works with SBATCH i.e. we create sbatch script file and julia script.

With ClusterManagers the biggest problem was is that there was only one minute to allocate all nodes otherwise an error appear. I couldn’t overcome this so I stopped working with it. Also we can’t use SRUN --export argument as Julia treats export as directive instead of an argument to a function.

And with SlurmClusterManager everything looks good to me except information about progress from workers get written to file very rare. I don’t know why and how but the output file gets updated maybe once in hour even if the output information comes ones in 5 seconds probably. That disappoints me because it is hard to see the progress but can’t find a workaround yet.

I can say that with SlurmClusterManager I solve my initial task with run.sbatch script like:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=6

julia my_script.jl
1 Like