Hiya,
I’m trying to add some remote workers to my local julia process. Both host and remote run Julia 1.8.5 on Linux. I have passwordless access to the remote machine via SSH making it reachable through its alias without any problems (i.e. ssh remotename
works).
However, when I run addprocs(["remotename"]; exename, dir)
(exename and dir being the proper strings) I face an error:
ERROR: TaskFailedException
nested task error: IOError: connect: host is unreachable (EHOSTUNREACH)
Stacktrace:
[1] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:1093
[2] worker_from_id
@ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:1090 [inlined]
[3] #remote_do#170
@ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:557 [inlined]
[4] remote_do
@ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:557 [inlined]
[5] kill(manager::Distributed.SSHManager, pid::Int64, config::WorkerConfig)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/managers.jl:700
[6] create_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:604
[7] setup_launched_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig, launched_q::Vector{Int64})
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:545
[8] (::Distributed.var"#45#48"{Distributed.SSHManager, Vector{Int64}, WorkerConfig})()
@ Distributed ./task.jl:484
caused by: IOError: connect: host is unreachable (EHOSTUNREACH)
Stacktrace:
[1] wait_connected(x::Sockets.TCPSocket)
@ Sockets ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Sockets/src/Sockets.jl:529
[2] connect
@ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Sockets/src/Sockets.jl:564 [inlined]
[3] connect_to_worker(host::String, port::Int64)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/managers.jl:659
[4] connect(manager::Distributed.SSHManager, pid::Int64, config::WorkerConfig)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/managers.jl:586
[5] create_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig)
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:600
[6] setup_launched_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig, launched_q::Vector{Int64})
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:545
[7] (::Distributed.var"#45#48"{Distributed.SSHManager, Vector{Int64}, WorkerConfig})()
@ Distributed ./task.jl:484
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:436
[2] macro expansion
@ ./task.jl:455 [inlined]
[3] addprocs_locked(manager::Distributed.SSHManager; kwargs::Base.Pairs{Symbol, String, Tuple{Symbol, Symbol}, NamedTuple{(:exename, :dir), Tuple{String, String}}})
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:490
[4] addprocs(manager::Distributed.SSHManager; kwargs::Base.Pairs{Symbol, String, Tuple{Symbol, Symbol}, NamedTuple{(:exename, :dir), Tuple{String, String}}})
@ Distributed ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:450
[5] #addprocs#255
@ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/managers.jl:146 [inlined]
[6] top-level scope
@ REPL[4]:1
A quick search for this error message didn’t result in anything useful. Can somebody help me understand the root of this?