Julia on Cluster with SSH Restriction

I know I’m a little late to this thread, but had the same problem. I hope this helps others out. My cluster uses PBS, but no ssh communication is allowed between nodes. My Julia code uses DistributedArrays, @sync and @async blocks, and I didn’t want to modify it too much. So I did the following as suggested by @barche.

using MyModules, MPI

# serial part of code
# here to set options for parallel run

# parallel start
mgr = MPI.start_main_loop(MPI.MPI_TRANSPORT_ALL)

addprocs(nworkers)
@info "workers are $(workers())"
@everywhere any(pwd() .== LOAD_PATH) || push!(LOAD_PATH, pwd())
@everywhere using Distributed, MyModules

# parallel code here using MyModules.foo(options, data)

rmprocs(workers())
MPI.stop_main_loop(mgr)

Crucially, I had to run the code as
mpirun -np 1 julia MyCode.jl

Here’s the whole PBS script requesting 32 workers:

#PBS -P blah
#PBS -q myque
#PBS -l ncpus=32
#PBS -l mem=256GB
#PBS -l walltime=00:15:00
#PBS -l wd
#PBS -N testJulia
#PBS -o grid.out
#PBS -e grid.err
#PBS -j oe

ulimit -s unlimited
ulimit -c unlimited
module load gcc/5.2.0 openmpi/3.0.1 julia/1.1.1
mpirun  -np 1 julia ./MyCode.jl > outfile.run

5 Likes