Hi all,
I’ve been using Julia on the cluster quite successfully with MPIClusterManagers.jl
and MPI.jl
with the MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
option and using native julia constructs after that such as remotecall
and friends. However, I just noticed something a little bit odd about how worker processes were getting distributed across cluster nodes where 1 node = 104 cores.
Specifically, if I ask for 208 cores through mpirun
, i.e., across 2 nodes with 208 cpus, as expected, I get 207 workers and 1 manager process. However, the first 104 workers get put on the first node, and the next 103 get put on the second. This seems a little strange to me, as this means that if the manager in one sided communication is doing moderate work, then we are oversubscribing the first node (104+1 processes on 104 cpus) and undersubscribing the second (103 processes on 104 cpus). This is making my task parallelization a little wonky as I try not to cross nodes in parallel communication with minimal comms between nodes.
I’m on julia 1.8, MPI.jl v0.19.2, MPIClusterManagers v0.2.4
My question: Is there a way to get an even 104 processes on all nodes?
Here’s my MWE to figure out the node/task split
# mpitest.jl
## MPI Init
using MPIClusterManagers, Distributed
import MPI
MPI.Init()
rank = MPI.Comm_rank(MPI.COMM_WORLD)
sz = MPI.Comm_size(MPI.COMM_WORLD)
if rank == 0
@info "size is $sz"
end
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
@info "there are $(nworkers()) workers"
@everywhere using Distributed
@everywhere begin
function getdetails()
[myid() gethostname()]
end
end
using DelimitedFiles
r = reduce(vcat, pmap(x->getdetails(), 1:nworkers()))
idx = sortperm(r[:,1])
writedlm("test.txt", r[idx,:])
MPIClusterManagers.stop_main_loop(manager)
rmprocs(workers())
exit()
which I run from a login node on the cluster as
mpirun -np 208 julia mpitest.jl
The output is
size is 208
there are 207 workers
and test.txt looks like
2 blah-cpu-spr-0570.blah
3 blah-cpu-spr-0570.blah
4 blah-cpu-spr-0570.blah
5 blah-cpu-spr-0570.blah
6 blah-cpu-spr-0570.blah
7 blah-cpu-spr-0570.blah
.
104 blah-cpu-spr-0570.blah
105 blah-cpu-spr-0570.blah
106 blah-cpu-spr-0569.blah
107 blah-cpu-spr-0569.blah
.
207 blah-cpu-spr-0569.blah
208 blah-cpu-spr-0569.blah