Ssh cluster julia host address issue

I am getting some error adding processors to a university cluster.
When I ssh to the cluster and start Julia, when I run

using Distributed
addprocs([("username@compute-6-300",1)], tunnel=true, max_parallel=1, exename="/julia-1.6.1/bin/julia", sshflags="-vv")

I get the error


Unmatched '.

\/julia-1.6.1/bin/julia' --worker: Command not found.

**ERROR:** TaskFailedException

    nested task error: Unable to read host:port string from worker. Launch command exited with error?
    Stacktrace:
     [1] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
       @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/cluster.jl:1082
     [2] worker_from_id
       @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/cluster.jl:1079 [inlined]
     [3] #remote_do#154
       @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:486 [inlined]
     [4] remote_do
       @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:486 [inlined]
     [5] kill(manager::Distributed.SSHManager, pid::Int64, config::WorkerConfig)
       @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/managers.jl:680
     [6] create_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig)
       @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/cluster.jl:593
     [7] setup_launched_worker(manager::Distributed.SSHManager, wconfig::WorkerConfig, launched_q::Vector{Int64})
       @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/cluster.jl:534
     [8] (::Distributed.var"#41#44"{Distributed.SSHManager, Vector{Int64}, WorkerConfig})()
       @ Distributed ./task.jl:411
    

Can someone please help?

What is the shell on the remote? Perhaps https://github.com/JuliaLang/julia/pull/41285 is related.

1 Like

Thanks a lot fredrikekre. I get

ssh -V to check

OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013

Things do not get updated often over here I am afraid. I had a looked at the link you sent but I am not quite sure what I am supposed to do about it. Any further help would be much appreciated?

Sorry, I meant that perhaps sh on the remote is not a POSIX shell (e.g. bash). On my machine sh is bash for example:

$ ls -l $(which sh)
lrwxrwxrwx 1 root root 4 26 maj  2020 /usr/bin/sh -> bash

Thanks again for clarifying. This is what I get:

echo “$SHELL”
/bin/tcsh

Okay, I believe that is not supported correctly (see the link in my first message).

thanks. i am not understanding how the guy fixed the problem but i ll ask there. thank you again.

Thank you so much! I tried many posted solutions online but NONE of them works. Thank you for mention this!
After the PR (https://github.com/JuliaLang/julia/pull/41485) now I managed to get it works with the new supported keyword ssh=:csh. For example, I can connect with

using Distributed
addprocs([("username@compute127", numberofprocs)], tunnel=true, max_parallel=1, exename="julia excecutable path", sshflags="-vv", shell=:csh)

in REPL. Thank you so much!

1 Like