Julia on Cluster with SSH Restriction

brossetti · June 20, 2018, 2:17pm

I’m attempting to use Julia on a cluster (using PBS) that blocks ssh connections to anything except the head node. Based on the documentation, it seems that Julia requires passwordless ssh to start workers on cluster nodes:

The base Julia installation has in-built support for two types of clusters:

A local cluster specified with the -p option as shown above.

A cluster spanning machines using the --machinefile option. This uses a passwordless ssh login to start Julia worker processes (from the same path as the current host) on the specified machines.

I’ve tried the solutions presented on this thread, but the following errors occur: (1) ClusterManagers hangs when calling addprocs_pbs() or (2) I get a permissions error when the ssh connection is attempted.

Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
ERROR: Unable to read host:port string from worker. Launch command exited with error?
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:236
connect(::Base.Distributed.SSHManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:391
create_worker(::Base.Distributed.SSHManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.SSHManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335

My SysAdmin seems unwilling to allow ssh connections to worker nodes. Is there another option for using Julia on the cluster that bypasses this problem?

johnh · June 20, 2018, 3:02pm

@Brosetti I have installed and managed PBS clusters. I think what is happening is that the PAM module for PBS is installed. Yes, this stops a user from sshing into the compute nodes. BUT when you are running a job you should be able to ssh into the compute nodes which are allocated to you.
You will find that there is an environment variable called PBS_NODELIST which gives you the list of compute nodes you can use.

Please give me five minutes and I will confirm the PAM module behaviour.

It could be that you have some other type of restriction though?

There is a very old style pbs_dsh utility whihc might be put into a wrapper script and substituted for ssh. But lets not go there.

johnh · June 20, 2018, 3:05pm

What I said is true for Torque
http://docs.adaptivecomputing.com/torque/3-0-5/3.4hostsecurity.php

My Google-fu is exhausted in looking for the PBSPro style.
May I suggest that you start an interctive job using qsub -i Once you have started a shel lon the first compute node see if you can ssh into the others?
I guess not as you have already shown us this output…

barche · June 20, 2018, 3:13pm

Another option you may have is the MPI ClusterManager, see e.g. here:
https://github.com/JuliaParallel/MPI.jl/blob/master/test/test_cman_mpi.jl

This is the currently released version, there is also my version that uses one-sided MPI calls here, but it’s not merged to MPI.jl yet:

github.com/JuliaParallel/MPI.jl

Add WindowIO and MPIWindowIOManager

JuliaParallel:master ← barche:window-cman

opened 09:59PM - 24 Feb 18 UTC

barche

+942 -19

The objective of this PR is to get some better cooperation between the Julia par…allel processing system and MPI. * `WindowIO` allows point-to-point communication between MPI processes using a Julia `IO`. Brief excerpt from the test: ```julia using MPI MPI.Init() comm = MPI.COMM_WORLD const rank = MPI.Comm_rank(comm) const N = MPI.Comm_size(comm) const winio = WindowIO(comm) # read from anyone const writer = WindowWriter(winio, 0) # writes to rank 0 if rank != 0 write(writer, rank) flush(writer) # Must flush to trigger communication else result = read(winio, (N-1)*sizeof(Int)) # Blocks until all required data is read end ``` * `MPIWindowIOManager` sets up a cluster manager to use the `WindowIO` buffer as communication layer. It adds two modes for `start_main_loop`: - `MPI_WINDOW_IO`: Normal cluster manager, with the workers only waiting on the process_events loop and all commands only executed on master - `MPI_WINDOW_NOWAIT`: The cluster manager is initialized, but workers don't process events and instead run in SPMD MPI fashion as usual. If a Julia parallel call is needed, it must be prefixed with the `@cluster` macro to activate the event loops on the workers. All code in the `@cluster` block then only runs on the master and can contain normal Julia parallel programming calls. After the block ends, SPMD mode is resumed.

Either of these options should give you native Julia parallel calls using MPI as the communication layer, bypassing the need for ssh entirely. I have only tested this with some basic DistributedArrays stuff, so I’m not sure how well it holds up.

johnh · June 20, 2018, 3:17pm

I would also like to ask. How are parallel jobs run on your cluster? I would guess by using an MPI with ‘munge’ authentication.
https://dun.github.io/munge/
Which leads me to open the discussin with the Julia developers. Maybe a plugin style is needed for Julia parallel, and do not assume that only ssh will be used.

Its worth saying that the original PBS was invented well before ssh was around and it used the r protocols - rsh rcp rlogin They have the wonderfulness of the original restriction to six character user names.
In addition to relying on dotfiles for security!

johnh · June 20, 2018, 3:19pm

Hi @barche But surely MPI needs to be launched somehow… and there are a variety of ways in which that is done. One of them being ssh!
I may well be barking up the wrong branch here (Git pun intended).

barche · June 20, 2018, 3:23pm

Yep, with any luck by simply doing mpirun julia mysimulation.jl

I tested this with slurm, there mpirun gets the nodes to run on and so forth directly from the task manager.

johnh · June 20, 2018, 3:27pm

@barche Slurm should be using munge for the mpirun startup phase
And yes slurm is pretty cute - it has good integration.
I will confess that the first time I met a slurm cluster I did the normal job submisssion script / find the list of nodes / create a custom hostfile. Till someone pointed out all you have to do is the srun… doooh

brossetti · June 20, 2018, 5:47pm

Thanks for the quick replies fellas. Let’s see if I can respond to your comments/questions:

Yes. I currently use this file in my pbs script. Here is what produced the errors that I reported above:

myjob.pbs

#!/bin/bash
#PBS -l nodes=12:ppn=4,walltime=00:05:00
#PBS -N test
#PBS -q batch

julia --machinefile $PBS_NODEFILE test.jl

test.jl

println("Hello from Julia")
addprocs(48)
np = nprocs()
println("Number of processes: $np")


@sync @parallel for i=1:48
  host = gethostname()
  pid = getpid()
  sleep(2)
  println("I am $host - $pid doing loop $i")
end

PAM is an interesting feature, but I don’t think this is what my SysAdmin is using. I believe he has globally blocked all users but root from ssh-ing to anything but the head node at all times. I’ll suggest that he look into PAM as an alternative.

The process hangs when I try interactive mode using qsub -I. It never gets past qsub: waiting for job to start. I suspect this has something to do with the global ssh block.

Thanks for this suggestion! I’m getting some build errors related to MPI_C. Once I sort them out, I’ll reply back with my results

This is my first time using the cluster, so I’ll have to look into the details.

johnh · June 20, 2018, 6:16pm

I suggest the following. HPC Admins portray themselves as Ogres. Heavy metal T-shirt? Combat boots? Black jeans?
Bring a packet of cookies. Even better some local craft beer.

Actually, ssh restriction like this are put there to stop users doing stupid things. The admin will want you to use his/her system (*). Some explanation of how Julia works and how great it is may wake the Ogre.

(*) Women can wear combat boots and T-shirts. I refer you to Lady Fiona in Shrek.

barche · June 20, 2018, 8:09pm

OK, feel free to post the error here or in an MPI.jl issue if you get stuck on it. Looking at your script, it should be sufficient to replace the current julia command with mpirun julia test.jl (assuming mpirun supports PBS, which it should), and enclose your test script in:

using MPI

mgr = MPI.start_main_loop(MPI.MPI_TRANSPORT_ALL)

# code

MPI.stop_main_loop(mgr)

JaredCrean2 · June 20, 2018, 9:59pm

Try doing:

export CC=mpicc
export FC=mpif90
export CXX=mpicxx

in the shell before building MPI.jl. How well CMake can find MPI installation seems to vary widely with CMake version, and setting these variables helps it.

sparrowhawk · July 10, 2019, 12:13am

I know I’m a little late to this thread, but had the same problem. I hope this helps others out. My cluster uses PBS, but no ssh communication is allowed between nodes. My Julia code uses DistributedArrays, @sync and @async blocks, and I didn’t want to modify it too much. So I did the following as suggested by @barche.

using MyModules, MPI

# serial part of code
# here to set options for parallel run

# parallel start
mgr = MPI.start_main_loop(MPI.MPI_TRANSPORT_ALL)

addprocs(nworkers)
@info "workers are $(workers())"
@everywhere any(pwd() .== LOAD_PATH) || push!(LOAD_PATH, pwd())
@everywhere using Distributed, MyModules

# parallel code here using MyModules.foo(options, data)

rmprocs(workers())
MPI.stop_main_loop(mgr)

Crucially, I had to run the code as
mpirun -np 1 julia MyCode.jl

Here’s the whole PBS script requesting 32 workers:

#PBS -P blah
#PBS -q myque
#PBS -l ncpus=32
#PBS -l mem=256GB
#PBS -l walltime=00:15:00
#PBS -l wd
#PBS -N testJulia
#PBS -o grid.out
#PBS -e grid.err
#PBS -j oe

ulimit -s unlimited
ulimit -c unlimited
module load gcc/5.2.0 openmpi/3.0.1 julia/1.1.1
mpirun  -np 1 julia ./MyCode.jl > outfile.run

bhageera · January 15, 2021, 6:08pm

Kindly assist. I am new to parallel computing on Julia.
I have a similar problem as stated here.
I am able to ssh into the nodes assigned to me from outside the job.
However, in the submitted job I get a timed out error with addprocs().

zsh: command not found: YLY3y21Al0T71U3B
zsh: command not found: YLY3y21Al0T71U3B
ERROR: LoadError: TaskFailedException:
Unable to read host:port string from worker. Launch command exited with error?
Stacktrace:
 [1] worker_from_id(::Distributed.ProcessGroup, ::Int64) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:1074
 [2] worker_from_id at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:1071 [inlined]
 [3] #remote_do#154 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:486 [inlined]
 [4] remote_do at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:486 [inlined]
 [5] kill(::Distributed.SSHManager, ::Int64, ::WorkerConfig) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/managers.jl:603
 [6] create_worker(::Distributed.SSHManager, ::WorkerConfig) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:585
 [7] setup_launched_worker(::Distributed.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:526
 [8] (::Distributed.var"#41#44"{Distributed.SSHManager,Array{Int64,1},WorkerConfig})() at ./task.jl:356

...and 1 more exception(s).

Stacktrace:
 [1] sync_end(::Channel{Any}) at ./task.jl:314
 [2] macro expansion at ./task.jl:333 [inlined]
 [3] addprocs_locked(::Distributed.SSHManager; kwargs::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:tunnel, :multiplex, :sshflags, :max_parallel, :topology),Tuple{Bool,Bool,Cmd,Int64,Symbol}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:480
 [4] addprocs(::Distributed.SSHManager; kwargs::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:tunnel, :multiplex, :sshflags, :max_parallel, :topology),Tuple{Bool,Bool,Cmd,Int64,Symbol}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:444
 [5] #addprocs#241 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/managers.jl:120 [inlined]
 [6] top-level scope at /home/beno/Desktop/juliaPD/startup.jl:2
 [7] include(::Function, ::Module, ::String) at ./Base.jl:380
 [8] include at ./Base.jl:368 [inlined]
 [9] exec_options(::Base.JLOptions) at ./client.jl:279
 [10] _start() at ./client.jl:506

I also tried the suggestion from @sparrowhawk and @barche. However, the MPI.start_main_loop(MPI_TRANSPORT_ALL) and MPI.stop_main_loop(mgr) commands seem to be deprecated.
Thank you. It is a PBSPro setup.

sparrowhawk · January 16, 2021, 3:27am

Depending on your cluster setup, one of the solutions with a minimum working example here should work. Please note that MPI_TRANSPORT_ALL will give you one less than the number of mpi nodes requested, i.e., mpirun -np 5 julia myfile.jl will give you 4 workers.

bhageera · January 16, 2021, 11:32am

Thank you for the prompt response.
The PBS script is as follows.

#PBS -q somequeue
#PBS -l nodes=2:ppn=20
#PBS -j oe
cat $PBS_NODEFILE > pbs_nodes

mpirun -n 40 /path/to/julia test.jl > out </dev/null

It is not behaving as @sparrowhawk suggested. The error shows that only one of the two nodes, and its corresponding processes are accessed for parallel execution.
Further, each action in the script is performed 20 times.
And providing -n 40 complains of oversubscribing.

barche · January 16, 2021, 3:40pm

Sounds to me like mpirun is not getting the info from the PBS scheduler. What happens if you just run mpirun hostname? If mpirun properly reads the PBS environment, that should print the hostnames of the compute nodes the correct number of times, and then Julia should also work with the MPI clustermanager.

bhageera · January 16, 2021, 3:47pm

@barche, you are right. It is only receiving one of the hostfiles 20 times.

I used the -machinefile option. Now it seems to be reading all the relevant processes.

Thanks a ton.

I have one more doubt, do I not have to use addprocs() for adding the workers anymore? Can I directly use the commands from the Distributed package?

barche · January 16, 2021, 7:25pm

That’s correct I think, at least I didn’t need addprocs when I tried this (in 2018).

Topic		Replies	Views
Help setting up Julia on a cluster Julia at Scale question , parallel , cluster	28	14957	March 4, 2020
Setting up distributed workers on seperate nodes of cluster - PBS and OpenMPI Julia at Scale	10	1199	May 19, 2021
Julia crashes when started on the nodes of a HPC cluster General Usage question , hpc , debug , cluster	8	2182	January 3, 2018
Julia on cluster, only MPI transport allowed General Usage parallel , distributed	7	1200	October 10, 2020
Can't add distributed workers per ssh: connection timed out General Usage distributed , ssh	10	1468	June 21, 2021

Julia on Cluster with SSH Restriction

Related topics