`addprocs(["remote"])` does not work, but `ssh remote julia --version` does. why?

question

#1

I see this behaviour

➜  ~ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> addprocs(["floswald@scpo-rents"])
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


sh: 1: cd: can't cd to /Users/florian.oswald

after which it hangs. I can passwordless into this machine with ssh floswald@scpo-rents, and julia runs fine on that machine at folder "/home/floswald/apps/julia-0.5/bin". doing

addprocs(["floswald@scpo-rents"],exename="/home/floswald/apps/julia-0.5/bin")

yields the same outcome. any ideas?


#2

Is there code in .juliarc.jl causing julia to try to cd to directory that doesn’t exist on the target? (not sure why that would hang)


#3

@ihnorton thanks for getting back.
It always complains about not being able to find pwd on the calling process. i.e. if I’m in ~ when I do addprocs it says it cant find my home directory, and similarly for any other directory:

julia> addprocs(["rents"])
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


sh: 1: cd: can't cd to /Users/florian.oswald/Dropbox/teaching/ScPo/ScPo-CompEcon/CoursePack
^CERROR: InterruptException:
 in parse_connection_info(::String) at ./multi.jl:1601
 in read_worker_host_port(::Pipe) at ./multi.jl:1589
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:385
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in addprocs(::Array{String,1}) at ./managers.jl:111

also:

cat ~/.juliarc.jl
push!(LOAD_PATH,"/Users/florian.oswald/git/migration/mig/src")

#4

login shell fixed:

ssh rents julia --version
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


julia version 0.5.1

#5

I am still stuck with this. i cannot do addprocs(["floswald@scpo-rents.sciences-po.fr"]) but I can do ssh floswald@scpo-rents.sciences-po.fr 'echo "println(rand(3))" | julia'.


#6

Check /home/floswald/.juliarc.jl on the target? Note that the HOME paths are different, so any hard-coded paths starting with /Users/... are not going to work on the target.


#7

By default, addprocs tries to cd the remote workers to the host’s current directory. That is, if you’re running julia at /Users/florian.oswald on your local machine, all of the workers added with addprocs will try to do a cd /Users/florian.oswald. Try using the dir keyword to addprocs to give your workers a different working directory.


#8

thanks! I think that gets very close! i don’t know why this fails now:

julia> addprocs(["rents"],dir="/home/floswald",exename="/home/floswald/apps/julia-0.5/bin/julia")
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


ERROR: connect: connection timed out (ETIMEDOUT)
 in yieldto(::Task, ::ANY) at ./event.jl:136
 in yieldto(::Task, ::ANY) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in wait() at ./event.jl:169
 in wait(::Condition) at ./event.jl:27
 in wait(::Condition) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in stream_wait(::TCPSocket, ::Condition, ::Vararg{Condition,N}) at ./stream.jl:44
 in wait_connected(::TCPSocket) at ./stream.jl:265
 in connect at ./stream.jl:960 [inlined]
 in connect_to_worker(::SubString{String}, ::Int16) at ./managers.jl:483
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:425
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Array{String,1}) at ./<missing>:0

julia> Master process (id 1) could not connect within 60.0 seconds.
exiting.


Trouble with addprocs
#9

I had the same problem. Connecting with tunnel=true seems to solve it for me.

(I have julia 0.5.1 on both client and server. You could try updating from 0.5.0 on your client.)


#10

good try. but:

julia> addprocs(["rents"],dir="/home/floswald",exename="/home/floswald/apps/julia-0.5/bin/julia",tunnel=true)
**************************************************

              WELCOME TO THE 

          SciencesPo Rent server

This system runs:
Ubuntu 16.04.1 LTS


ERROR: unable to create SSH tunnel after 0 tries. No free port?
 in ssh_tunnel(::String, ::SubString{String}, ::SubString{String}, ::UInt16, ::Cmd) at ./managers.jl:272
 in connect_to_worker(::SubString{String}, ::SubString{String}, ::Int16, ::String, ::Cmd) at ./managers.jl:498
 in connect(::Base.SSHManager, ::Int64, ::WorkerConfig) at ./managers.jl:420
 in create_worker(::Base.SSHManager, ::WorkerConfig) at ./multi.jl:1786
 in setup_launched_worker(::Base.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./multi.jl:1733
 in (::Base.##649#653{Base.SSHManager,Array{Int64,1}})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1688
 in #addprocs_locked#645(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs_locked)(::Array{Any,1}, ::Base.#addprocs_locked, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at ./multi.jl:1658
 in #addprocs#644(::Array{Any,1}, ::Function, ::Base.SSHManager) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Base.SSHManager) at ./<missing>:0
 in #addprocs#744(::Bool, ::Cmd, ::Int64, ::Array{Any,1}, ::Function, ::Array{String,1}) at ./managers.jl:112
 in (::Base.#kw##addprocs)(::Array{Any,1}, ::Base.#addprocs, ::Array{String,1}) at ./<missing>:0

julia> 

could it be a version issue? 0.5 vs 0.5.1?


#11

Did you happen to have any luck with your issue? I’m encountering something similar but do not even have error messages (except the timeout) to guide me.

I have a little cluster of 4 machines, where they are all sharing the same files off an nfs server. Ssh to each works, but addprocs(remote) only will connect to 3 of the four. At this point I’m ready to wipe the problem machine and start from scratch again…