Job hangs - "waiting for job to start" on a PBS Cluster


#1

I’m trying to use ClusterManagers on a PBS cluster (interactively e.g.)

julia> using ClusterManagers

julia> addprocs_pbs(2, queue="default")
job id is 135963, waiting for job to start ................................................................

The job seems to hang even though it appears to run on qstat

Job id Name User Time Use S Queue


135963[].pippen julia-26303 snirgaz 0 R default

Any thoughts?


#2

Have the same issue :frowning: Did you found solution?


#3

In my case ClusterManagers were looking for files with a wrong filename. I fixed that by changing function filename(i) in ClusterManagers/src/qsub.jl from:

filename(i) = isPBS ? "$home/julia-$(getpid()).o$id-$i" : "$home/julia-$(getp\
id()).o$id.$i"

to

filename(i) = isPBS ? "$home/julia-$(getpid())-$i.o$id" : "$home/julia-$(getpi\
d()).o$id.$i"

Now I can open REPL on the head node and simply run

using ClusterManagers
addprocs_pbs(5)
pmap(x->(sleep(1);run(`echo $x`);x^2),1:10))