Trying to launch Julia cluster under PBS

Hi. I used PBS qsub to get four nodes on our cluster. I put the contents of $PBS_NODEFILE in the array hosts below. After doing “using Distributed”, I tried using addprocs as below, and I don’t understand what to do about the error message. Suggestions welcome. Thanks.

-Jeff

**julia>** hosts = ["r409i4n16","r433i2n4","r433i2n8","r433i3n11"]

4-element Array{String,1}:

 "r409i4n16"

 "r433i2n4"

 "r433i2n8"

 "r433i3n11"

**julia>** addprocs(hosts,tunnel="true",exename="/u/jcbecker/julia-1.5.1/bin/julia")

 

exception launching on machine r409i4n16 : TypeError(:if, "", Bool, "true")

exception launching on machine r433i2n8 : TypeError(:if, "", Bool, "true")

exception launching on machine r433i3n11 : TypeError(:if, "", Bool, "true")

exception launching on machine r433i2n4 : TypeError(:if, "", Bool, "true")

Int64[]

You could try using ClusterManagers.jl, though I think the PBS support there might not be working.

It is the boolean value true or false, not “true”
try tunnel=true

An SGI ICE cluster with over 400 racks! I am impressed!

Also I admit my ignorance. Why ssh tunnel to cluster nodes - you should be able to reach them directly?
Clue stick coming my way…

Thanks - that worked

The ssh tunnel is needed on our cluster (under PBS) - otherwise the master has trouble connecting to workers.

Also, yesterday I tried addprocs_pbs in ClusterManagers, but couldn’t get that to work. It seemed to want to add an illegal qsub parameter -t with the node range (1-numnodes) that I was asking for.

@Jeff_Becker That ClusterManagers package uses an older version of PBS. I think I have a commen ton the issues page there.

Sorry to be pushy here - why would you need to tunnel within a cluster?
I have managed three SGI ICE clusters, using PBS.
And two very large clusters using PBS in the Netherlands.

There is a PBS configuration using a PAM module which prevents users sshing into cluster nodes if you are NOT running a job on them. I think you are using this?

You are right - the default interface for my PBS nodes was selected incorrectly - if I specify node IPoIB addresses in my host array, it works fine without tunneling. Thanks for pushing 8^)

@Jeff_Becker take a look at this guide: https://github.com/juliohm/julia-distributed-computing

If it doesn’t work for your PBS cluster, please open an issue. The idea is to centralize all issues related to distributed computing in a single place where people can suggest improvements and update the instructions.