It’s hard to say what the issue is without more information. Have you checked that you have successfully launched Julia worker processes on multiple nodes? Just a guess but one potential culprit might be ssh tunneling.
Which version of Julia are you using? Haven’t tried out 1.0 in a multiple machine setting yet but on 0.64 on my research’s group’s cluster, I fail to connect to workers on nodes other than the one hosting the master process using
addprocs if I don’t indicate that ssh tunneling is required. E.g. (where tera31 and tera32 are hostnames of two nodes) for me
procs = ["tera31","tera32"]
works. If you’re in an environment where it takes a long time for the connections with remote workers to be established for whatever reason, you can also try setting the
JULIA_WORKER_TIMEOUT environment variable on the master process before calling addprocs. This will make Julia wait longer before giving up on connecting to workers.