Hi. I used PBS qsub to get four nodes on our cluster. I put the contents of $PBS_NODEFILE in the array hosts below. After doing “using Distributed”, I tried using addprocs as below, and I don’t understand what to do about the error message. Suggestions welcome. Thanks.
The ssh tunnel is needed on our cluster (under PBS) - otherwise the master has trouble connecting to workers.
Also, yesterday I tried addprocs_pbs in ClusterManagers, but couldn’t get that to work. It seemed to want to add an illegal qsub parameter -t with the node range (1-numnodes) that I was asking for.
@Jeff_Becker That ClusterManagers package uses an older version of PBS. I think I have a commen ton the issues page there.
Sorry to be pushy here - why would you need to tunnel within a cluster?
I have managed three SGI ICE clusters, using PBS.
And two very large clusters using PBS in the Netherlands.
There is a PBS configuration using a PAM module which prevents users sshing into cluster nodes if you are NOT running a job on them. I think you are using this?
You are right - the default interface for my PBS nodes was selected incorrectly - if I specify node IPoIB addresses in my host array, it works fine without tunneling. Thanks for pushing 8^)
If it doesn’t work for your PBS cluster, please open an issue. The idea is to centralize all issues related to distributed computing in a single place where people can suggest improvements and update the instructions.