Trouble with addprocs

question
distributed
gettingstarted

#1

I’m having trouble with addprocs. I have set up passwordless login via a key (though the key still asks for my password, in case that could be the issue, though I doubt it).

I also have looked at other posts here but they seem to be running in different problems, or are unsolved.

I am using the following call

julia> addprocs(["admin@some.server.com"],exename="julia",dir="/home/admin")
Enter passphrase for key '/c/Users/Jeremy/.ssh/id_rsa':
ERROR: connect: connection timed out (ETIMEDOUT)
try_yieldto(::Base.##296#297{Task}, ::Task) at .\event.jl:189
...

The connection is clearly working, since if I change the exename, I get an error from the server telling me the executable does not exist:

julia> addprocs(["admin@some.server.com"],exename="notjulia",dir="/home/admin")
Enter passphrase for key '/c/Users/Jeremy/.ssh/id_rsa':
bash: notjulia: command not found
ERROR: Unable to read host:port string from worker. Launch command exited with error?
read_worker_host_port(::Pipe) at .\distributed\cluster.jl:236
...

Finally, trying to use SSH tunnels:

julia> addprocs(["admin@some.server.com"],exename="julia",dir="/home/admin",tunnel=true)
Enter passphrase for key '/c/Users/Jeremy/.ssh/id_rsa':
ERROR: unable to create SSH tunnel after 100 tries. No free port?
ssh_tunnel(::SubString{String}, ::SubString{String}, ::SubString{String}, ::UInt16, ::Cmd) at .\distributed\managers.jl:278

I am guessing that these errors are due to the server’s firewall. I therefore have two questions: How do we know which ports to open for 1., the regular (non-SSH) workers, and 2., the SSH tunnel workers?

Thanks,
Jeremy


#2

From what you write here the problem really seems to be the password on the ssh key. This won’t work - password less is a requirement. Just erase that key (both public and private part) - regenerate without passphrase and start from there.


#3

Despite the ssh connection being clearly established properly?


#4

well the connection only works if you supply a password. julia won’t supply your password, so you won’t have a connection. that second error message just shows that the master couldn’t read back from the worker, it does not indicate that you established a connection.


#5

I actually can supply the password when that prompt appears.

I also get an error message if I don’t give the right dir. I am therefore fairly certain the SSH connection works fine.


#6

Ok i see. Can you ssh from the worker back to the master? In general you need password less ssh that works both ways.


#7

From what I understand, in non-tunnel mode, the master-worker connection does not use SSH, so that should not be the issue.

In tunnel mode, I am not sure how the SSH connection is established, but the docs don’t mention anything about needing a passwordless login from the worker. If that is needed then it obviously wouldn’t work in my current configuration.


#8

It turns out that the password was the issue for the tunnel workers. I used SSH agent and that fixed it.

It didn’t fix the non-tunnel workers, so that’s still probably a firewall issue, but it’s OK since I’d rather not have unsecured communication.


#9

The non-tunneled workers might be https://github.com/JuliaLang/julia/pull/25126.


#10

The workers need to open a port, but if you have a firewall on the server, how do you know which ports to allow?

I guess that pull request gives a range, but I’m not sure how safe it is to leave ports open like this on a public server…