Julia on cluster, only MPI transport allowed

sparrowhawk · October 8, 2020, 12:17am

Hi all,

I’m on a PBS cluster where I cannot simply provide a machine file to julia. After a lot of trials, I found that only MPI transport works on the cluster, and I was able to adapt an MPIClusterManagers.jl example with MPI_TRANSPORT_ALL on our cluster.

However, I’ve come across a rather strange phenomenon: If I request PBS for 4 cpus, and start my MPI job as

mpirun -np 4 julia myscript.jl

where myscript.jl contains the following MWE

using MPIClusterManagers, Distributed
import MPI 
MPI.Init()
rank = MPI.Comm_rank(MPI.COMM_WORLD)
size = MPI.Comm_size(MPI.COMM_WORLD)
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
@info "workers are $(workers())"
rmprocs(workers())
MPIClusterManagers.stop_main_loop(manager)
exit()

I get [Info: workers are [2,3,4]

I don’t get to use the 4th cpu! Of course, if I ask PBS for 5 cpus, I get to use 4. So I’m paying for 1/ncpus_requested more cpu time with every PBS request! addprocs(1) simply oversubscribes without using the last CPU.

Any ideas on how to fix this?

PetrKryslUCSD · October 8, 2020, 12:52am

What is size? Do you get communicator size equal to the number of requested processes? Recall that workers returns one less than the number of processes (i.e. minus the main one).

marius311 · October 8, 2020, 6:53am

You can try submitting a 4 CPU job but running with:

mpirun --oversubscribe -np 5

mcreel · October 8, 2020, 9:24am

I think that this is normal for MPIManager. From https://github.com/JuliaParallel/MPIClusterManagers.jl, one of the modes executes code only on the workers:
"

MPIManager: only workers execute MPI code

An example is provided in examples/juliacman.jl . The julia master process is NOT part of the MPI cluster. The main script should be launched directly, MPIManager internally calls mpirun to launch julia/MPI workers. All the workers started via MPIManager will be part of the MPI cluster."

sparrowhawk · October 8, 2020, 1:05pm

@mcreel, that’s almost the right answer – you sent me down the right path. I wasn’t using the paradigm you linked, where only workers use MPI code. I was using instead, the one where the master and workers all use MPI through TCP/IP transport.

When I switched to workers-only-use MPI, this MWE produces results like I expected:

using MPIClusterManagers, Distributed
manager = MPIManager(np=4)
addprocs(manager)
@info "workers are $(workers())"
exit()

I get [Info: workers are [2,3,4,5]

Please note that this is NOT run through mpirun. I simply ran this, after requesting PBS for 4 cpus, and then doing
julia myscript.jl >& outfile

My parallel Julia code with @sync @async and remotecall_fetch worked as expected with near 90% CPU utilisation on all 4 requested cores on the PBS assigned node,

johnh · October 8, 2020, 1:24pm

Hi Sparrowhawk. Can you say more about what you mean by you cannot supply a machine file?

Is this related to the MPI job launch mechanism - are you using an authentication method called munge?

sparrowhawk · October 10, 2020, 3:21am

Hi @johnh, I’m not sure what the authentication method is. However, if I log in with an interactive PBS job using qsub -I, and try specifying a machine file like so I get an error:

julia --machine-file=$PBS_NODEFILE
Host key verification failed.

Though the PBS_NODEFILE exists, and I can echo $PBS_NODEFILE
An admin on our cluster told me that only MPI communication is allowed between nodes, which is why I resort to using MPIClusterManagers.jl

Cheers

johnh · October 10, 2020, 7:44am

It does sound like Munge authentication is being used here, but I may be on the wrong track.

Topic		Replies	Views
Julia on Cluster with SSH Restriction General Usage question , cluster	18	3949	January 16, 2021
Help setting up Julia on a cluster Julia at Scale question , parallel , cluster	28	14958	March 4, 2020
MPIClusterManagers task split across nodes Julia at Scale cluster , mpi , distributed	2	541	April 28, 2023
ClusterManagers.jl hangs on pbs Julia at Scale parallel	9	2285	March 3, 2020
How to get started with distributed memory parallel programming? New to Julia	3	695	June 9, 2021

Julia on cluster, only MPI transport allowed

MPIManager: only workers execute MPI code

Related topics