Hello,
Sorry to revive this issue but I too am trying to set up running some Julia code on a remote cluster using PBS. I tried to run the test script test_julia.jl @ElOceanografo posted, with the update @juliohm posted, i.e. :
using Distributed
using ClusterManagers
addprocs_pbs(15)
println("Hello from Julia")
np = nprocs()
println("Number of processes: $np")
for i in workers()
host, pid = fetch(@spawnat i (gethostname(), getpid()))
println("Hello from process $(pid) on host $(host)!")
end
tasks = randn(np * 30)
@everywhere begin
function foo(x)
return x * 4
end
end
results = pmap(foo, tasks)
println(results)
for i in workers()
rmprocs(i)
end
Where my submission script looks like,
#!/bin/sh
#PBS -N test_parallel
#PBS -l walltime=24:00:00
#PBS -l nodes=1:ppn=16
#PBS -j oe
cd $PBS_O_WORKDIR
julia test.jl
Unfortunately this then returns the error(s),
┌ Warning: rmprocs: process 1 not removed
└ @ Distributed /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:928
Error launching workers
MethodError(iterate, (Base.ProcessChain(Base.Process[Process(`echo 'cd /home/puck/WORK/testParallel && /usr/bin/julia --worker=c7kM2ZZEVlrMc45M'`, ProcessRunning), Process(`qsub -N julia-316294 -j oe -k o -t 1-15`, ProcessRunning)], Base.DevNull(), Base.PipeEndpoint(RawFD(0x00000011) open, 0 bytes waiting), Base.DevNull()),), 0x00000000000063e9)
Hello from Julia
Number of processes: 1
Hello from process 316294 on host planck!
[-0.838315, 1.15875, -1.13005, -2.51021, 0.299758, -1.20761, -2.66802, 0.591652, 1.14451, -6.3455, -6.15408, 1.97041, -0.74972, -3.17471, 7.71404, 1.37577, -3.61361, 2.89938, 2.10592, 4.70652, -1.72959, -2.48799, -2.66151, -0.136183, -1.27427, -4.37823, -1.17756, 6.24257, -5.0602, -4.91916]
I’m new to HPC services as well as Julia, so I’m afraid I’m quite baffled by what’s going on. It looks like it’s only launching one worker, though I’m using a machine with 32 CPUs (1 socket, 16 cores per socket, 2 threads per core).
Could anyone point out the obvious to me here?