I’ve been using Julia on the cluster quite successfully with
MPI.jl with the
MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL) option and using native julia constructs after that such as
remotecall and friends. However, I just noticed something a little bit odd about how worker processes were getting distributed across cluster nodes where 1 node = 104 cores.
Specifically, if I ask for 208 cores through
mpirun, i.e., across 2 nodes with 208 cpus, as expected, I get 207 workers and 1 manager process. However, the first 104 workers get put on the first node, and the next 103 get put on the second. This seems a little strange to me, as this means that if the manager in one sided communication is doing moderate work, then we are oversubscribing the first node (104+1 processes on 104 cpus) and undersubscribing the second (103 processes on 104 cpus). This is making my task parallelization a little wonky as I try not to cross nodes in parallel communication with minimal comms between nodes.
I’m on julia 1.8, MPI.jl v0.19.2, MPIClusterManagers v0.2.4
My question: Is there a way to get an even 104 processes on all nodes?
Here’s my MWE to figure out the node/task split
# mpitest.jl ## MPI Init using MPIClusterManagers, Distributed import MPI MPI.Init() rank = MPI.Comm_rank(MPI.COMM_WORLD) sz = MPI.Comm_size(MPI.COMM_WORLD) if rank == 0 @info "size is $sz" end manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL) @info "there are $(nworkers()) workers" @everywhere using Distributed @everywhere begin function getdetails() [myid() gethostname()] end end using DelimitedFiles r = reduce(vcat, pmap(x->getdetails(), 1:nworkers())) idx = sortperm(r[:,1]) writedlm("test.txt", r[idx,:]) MPIClusterManagers.stop_main_loop(manager) rmprocs(workers()) exit()
which I run from a login node on the cluster as
mpirun -np 208 julia mpitest.jl
The output is
size is 208 there are 207 workers
and test.txt looks like
2 blah-cpu-spr-0570.blah 3 blah-cpu-spr-0570.blah 4 blah-cpu-spr-0570.blah 5 blah-cpu-spr-0570.blah 6 blah-cpu-spr-0570.blah 7 blah-cpu-spr-0570.blah . 104 blah-cpu-spr-0570.blah 105 blah-cpu-spr-0570.blah 106 blah-cpu-spr-0569.blah 107 blah-cpu-spr-0569.blah . 207 blah-cpu-spr-0569.blah 208 blah-cpu-spr-0569.blah