May or may not be a bug, it’s my first day using parallel Julia. I do find it odd however that if i do an addprocs on localhost first and then on remote:
addprocs(1)
addprocs([("<myotherbox>", 1)])
I get a crash, but if i do the remote first all’s well:
addprocs([("<myotherbox>", 1)])
addprocs(1)
Note that I can make as many addprocs calls to remote as I like, and then follow that with one or more local calls, and all is well, but any subsequent remote calls will always fail. I am running official build of 0.6.2 on up to date arch linux. I am calling addprocs manually at the REPL, a single call at a time. I get the following error, the second bit only after hitting Ctrl-C:
ERROR: connect: connection refused (ECONNREFUSED)
Stacktrace:
[1] try_yieldto(::Base.##296#297{Task}, ::Task) at ./event.jl:189
[2] wait() at ./event.jl:234
[3] wait(::Condition) at ./event.jl:27
[4] stream_wait(::TCPSocket, ::Condition, ::Vararg{Condition,N} where N) at ./stream.jl:42
[5] wait_connected(::TCPSocket) at ./stream.jl:258
[6] connect at ./stream.jl:983 [inlined]
[7] connect_to_worker(::String, ::UInt16) at ./distributed/managers.jl:497
[8] connect_w2w(::Int64, ::WorkerConfig) at ./distributed/managers.jl:452
[9] connect(::Base.Distributed.DefaultClusterManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:386
[10] connect_to_peer(::Base.Distributed.DefaultClusterManager, ::Int64, ::WorkerConfig) at
./distributed/process_messages.jl:329
[11] (::Base.Distributed.##117#118{WorkerConfig,Int64})() at ./task.jl:335
Error [connect: connection refused (ECONNREFUSED)] on 3 while connecting to peer 2. Exiting.
Worker 3 terminated.
ERROR (unhandled task failure): Version read failed. Connection closed by peer.
Stacktrace:
[1] process_hdr(::TCPSocket, ::Bool) at ./distributed/process_messages.jl:257
[2] message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at ./distributed/process_messages.jl:143
[3] process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at ./distributed/process_messages.jl:118
[4] (::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
^Cfatal: error thrown and no exception handler available.
InterruptException()
jl_run_once at /buildworker/worker/package_linux64/build/src/jl_uv.c:132
process_events at ./libuv.jl:82 [inlined]
wait at ./event.jl:216
task_done_hook at ./task.jl:256
unknown function (ip: 0x7f044a9e672b)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
finish_task at /buildworker/worker/package_linux64/build/src/task.c:232
start_task at /buildworker/worker/package_linux64/build/src/task.c:275
unknown function (ip: 0xffffffffffffffff)
Which is all fairly meaningless to me. Just thought I’d bring it up.