`addprocs(["remote"])` does not work, but `ssh remote julia --version` does. why?

I see this behaviour

➜  ~ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> addprocs(["floswald@scpo-rents"])
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


sh: 1: cd: can't cd to /Users/florian.oswald

after which it hangs. I can passwordless into this machine with ssh floswald@scpo-rents, and julia runs fine on that machine at folder "/home/floswald/apps/julia-0.5/bin". doing

addprocs(["floswald@scpo-rents"],exename="/home/floswald/apps/julia-0.5/bin")

yields the same outcome. any ideas?

Is there code in .juliarc.jl causing julia to try to cd to directory that doesn’t exist on the target? (not sure why that would hang)

@ihnorton thanks for getting back.
It always complains about not being able to find pwd on the calling process. i.e. if I’m in ~ when I do addprocs it says it cant find my home directory, and similarly for any other directory:

julia> addprocs(["rents"])
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


sh: 1: cd: can't cd to /Users/florian.oswald/Dropbox/teaching/ScPo/ScPo-CompEcon/CoursePack
^CERROR: InterruptException:
 in parse_connection_info(::String) at ./multi.jl:1601
 in read_worker_host_port(::Pipe) at ./multi.jl:1589
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:385
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in addprocs(::Array{String,1}) at ./managers.jl:111

also:

cat ~/.juliarc.jl
push!(LOAD_PATH,"/Users/florian.oswald/git/migration/mig/src")

login shell fixed:

ssh rents julia --version
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


julia version 0.5.1

I am still stuck with this. i cannot do addprocs(["floswald@scpo-rents.sciences-po.fr"]) but I can do ssh floswald@scpo-rents.sciences-po.fr 'echo "println(rand(3))" | julia'.

Check /home/floswald/.juliarc.jl on the target? Note that the HOME paths are different, so any hard-coded paths starting with /Users/... are not going to work on the target.

By default, addprocs tries to cd the remote workers to the host’s current directory. That is, if you’re running julia at /Users/florian.oswald on your local machine, all of the workers added with addprocs will try to do a cd /Users/florian.oswald. Try using the dir keyword to addprocs to give your workers a different working directory.

3 Likes

thanks! I think that gets very close! i don’t know why this fails now:

julia> addprocs(["rents"],dir="/home/floswald",exename="/home/floswald/apps/julia-0.5/bin/julia")
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


ERROR: connect: connection timed out (ETIMEDOUT)
 in yieldto(::Task, ::ANY) at ./event.jl:136
 in yieldto(::Task, ::ANY) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in wait() at ./event.jl:169
 in wait(::Condition) at ./event.jl:27
 in wait(::Condition) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in stream_wait(::TCPSocket, ::Condition, ::Vararg{Condition,N}) at ./stream.jl:44
 in wait_connected(::TCPSocket) at ./stream.jl:265
 in connect at ./stream.jl:960 [inlined]
 in connect_to_worker(::SubString{String}, ::Int16) at ./managers.jl:483
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:425
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Array{String,1}) at ./<missing>:0

julia> Master process (id 1) could not connect within 60.0 seconds.
exiting.

I had the same problem. Connecting with tunnel=true seems to solve it for me.

(I have julia 0.5.1 on both client and server. You could try updating from 0.5.0 on your client.)

good try. but:

julia> addprocs(["rents"],dir="/home/floswald",exename="/home/floswald/apps/julia-0.5/bin/julia",tunnel=true)
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


ERROR: unable to create SSH tunnel after 0 tries. No free port?
 in ssh_tunnel(::String, ::SubString{String}, ::SubString{String}, ::UInt16, ::Cmd) at ./managers.jl:272
 in connect_to_worker(::SubString{String}, ::SubString{String}, ::Int16, ::String, ::Cmd) at ./managers.jl:498
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:420
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Array{String,1}) at ./<missing>:0

julia> 

could it be a version issue? 0.5 vs 0.5.1?

Did you happen to have any luck with your issue? I’m encountering something similar but do not even have error messages (except the timeout) to guide me.

I have a little cluster of 4 machines, where they are all sharing the same files off an nfs server. Ssh to each works, but addprocs(remote) only will connect to 3 of the four. At this point I’m ready to wipe the problem machine and start from scratch again…