I often run parallel jobs on many nodes of HPC.
My university cluster uses slurm and I use ClusterManagers.jl to submit jobs to many nodes.
I love that it’s so easy to modify my code to run on 100s of cores!
For example, here’s the beginning of my code to run on 24(nodes)x32(cores per node)=768 cores.
cd(@__DIR__) using Pkg Pkg.activate(".") using Distributed, ClusterManagers np=768 addprocs(SlurmManager(np); exeflags="--project")
I have a small problem that my university cluster partition has most nodes with 32 cores and a few nodes with 40 cores. Sometimes I would submit a job by manually setting number of cores to 768 but more cores will be provided by slurm because a few of the machines have 40 instead of 32 cores. So several cores will stay idle and not do calculations leading to wasted resources.
Is there a way to automatically determine number of cores provided by HPC and provide this number to
addprocs(SlurmManager(np); exeflags="--project") instead of manually setting
On my computer you could just do
addproc() and that will set correct number of cores automatically. Is there a similar command for multinode clusters?