How to automatically determine number of cores provided by cluster?

I often run parallel jobs on many nodes of HPC.
My university cluster uses slurm and I use ClusterManagers.jl to submit jobs to many nodes.
I love that it’s so easy to modify my code to run on 100s of cores!
For example, here’s the beginning of my code to run on 24(nodes)x32(cores per node)=768 cores.

cd(@__DIR__)
using Pkg
Pkg.activate(".")

using Distributed, ClusterManagers

np=768
addprocs(SlurmManager(np); exeflags="--project")

I have a small problem that my university cluster partition has most nodes with 32 cores and a few nodes with 40 cores. Sometimes I would submit a job by manually setting number of cores to 768 but more cores will be provided by slurm because a few of the machines have 40 instead of 32 cores. So several cores will stay idle and not do calculations leading to wasted resources.

Is there a way to automatically determine number of cores provided by HPC and provide this number to addprocs(SlurmManager(np); exeflags="--project") instead of manually setting np=768?
On my computer you could just do addproc() and that will set correct number of cores automatically. Is there a similar command for multinode clusters?

Maybe using info from sinfo -o%c?

Hmmm… sinfo -p mypartition -o%c does tell me how many cores the partition has

CPUS
32+

But when I submit a job, I cannot control (as far as I know) which exact nodes will be assigned to my job. Sometimes it’s 32 core nodes and other times 40 core nodes. So I don’t think sinfo -p mypartition -o%c will help me here.

In the systems I’m familiar with, you request how many cores you want in your SLURM script. If I want 768 cores I do

#SBATCH -n 768

Is that not what you do?

To add to my previous post, you should be able to use sstat or scontrol to get the information for the job once it has been started. I’m not sure how easily it can be parsed with a script though.

Yes, I use a SLURM script with --nodes, --ntasks-per-node and --cpus-per-task=1 like this:

#!/bin/bash
# Job name:
#SBATCH --job-name=julia_job_name
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Request nodes:
#SBATCH --nodes=24
#
# Specify number of tasks:
#SBATCH --ntasks-per-node=32
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=12:00:00

# Load software
module purge
module load julia/1.6.0

# Run Julia script
julia my_code.jl

But sometimes slurm would give me for example 10 out of 24 nodes that have 40 instead of 32 cores. And since my julia code has np=768, I have a bunch of cores being idle. And I’m being charged for all the cores on a particular node regardless of whether I’m using them or not so it’s a bit annoying to have those 8 cores per node being idle.

I assume there’s some slurm command that I can use to pass info to my julia code about exactly how many cores have been provided instead of inputting info from my slurm script manually… just need to figure out what it is so hoping for some suggestion here?

Thanks everyone for suggestions. I think I found a solution based on this post.

I can automatically pass the number of cores to julia using ENV["SLURM_NTASKS"] like this:

cd(@__DIR__)
using Pkg
Pkg.activate(".")

using Distributed, ClusterManagers

np = parse(Int, ENV["SLURM_NTASKS"])
addprocs(SlurmManager(np); exeflags="--project")

Now I only need to ask for desired number of nodes and julia will automatically figure out how many cores it got allocated and all the cores will be used for calculations.

3 Likes

Final update about this.
Actually the right way to do this is to use ENV[“SLURM_JOB_CPUS_PER_NODE”].

ENV[“SLURM_JOB_CPUS_PER_NODE”] gives you info about nodes in a format like this “32(4x),40” for 5 nodes with 4 nodes with 32 cores and 1 node with 40 cores.

This info can be parsed into sum of cores across all nodes using julia code like this where np is the total number of CPUs across all nodes:

cd(@__DIR__)
using Pkg
Pkg.activate(".")

using Distributed, ClusterManagers

subs = Dict("x"=>"*", "(" => "", ")" => "");
np = sum(eval(Meta.parse(replace(ENV["SLURM_JOB_CPUS_PER_NODE"], r"x|\(|\)" => s -> subs[s]))))

addprocs(SlurmManager(np); exeflags="--project")

Now you can just submit jobs using the following slurm script and it’ll automatically assign number of workers to be equal to number of cores assigned to you even if each node has different number of cores:

#!/bin/bash
# Job name:
#SBATCH --job-name=your_job_name
#
# Account:
#SBATCH --account=your_account_name
#
# Partition:
#SBATCH --partition=your_partition_name
#
# Request one node:
#SBATCH --nodes=24
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load software
module purge
module load julia/1.6.0

# Run Julia script
julia your_julia_code.jl

2 Likes

You can also have a look at GitHub - kleinhenz/SlurmClusterManager.jl: julia package for running code on slurm clusters
If I understand it correctly,

using Distributed, SlurmClusterManager
addprocs(SlurmManager())

inside your julia script in slurm should use all available nodes.

Hmmm… are you sure that’s true for SlurmClusterManager?

Documentation for SlurmClusterManager says this:

“Requires that SlurmManager be created inside a Slurm allocation created by sbatch/salloc. Specifically SLURM_JOBID and SLURM_NTASKS must be defined in order to construct SlurmManager .”

If the above is exactly correct then I presume it cannot do what I want as I don’t want to define SLURM_NTASKS as I don’t know how many processors will be allocated exactly when I request 20 nodes for example. If I set SLURM_NTASKS to some number then if I get allocated 20 nodes with higher number of CPUs then those extra CPUs will stay idle but I’ll be charged for it.