Error starting Distributed in Linux and Julia 1.10.x

I am unable to use Distributed in Julia 1.10.x on a linux machine. Works fine on macOS ARM, so I am not sure if this is general enough to open an issue. It also works as expected using Julia 1.9.4 on the same machine.

Here’s the error message on Julia 1.10.4:

$ julia -p 4
ERROR: Unable to load dependent library /opt/local/julia/julia-1.10.4/bin/../lib/julia/libjulia-codegen.so.1.10
Message:libLLVM-15jl.so: failed to map segment from shared object
ERROR: TaskFailedException

    nested task error: Unable to read host:port string from worker. Launch command exited with error?
    Stacktrace:
     [1] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1093
     [2] worker_from_id
       @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1090 [inlined]
     [3] remote_do
       @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:557 [inlined]
     [4] kill(manager::Distributed.LocalManager, pid::Int64, config::WorkerConfig; exit_timeout::Int64, term_timeout::Int64)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/managers.jl:738
     [5] kill
       @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/managers.jl:736 [inlined]
     [6] create_worker(manager::Distributed.LocalManager, wconfig::WorkerConfig)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:604
     [7] setup_launched_worker(manager::Distributed.LocalManager, wconfig::WorkerConfig, launched_q::Vector{Int64})
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:545
     [8] (::Distributed.var"#45#48"{Distributed.LocalManager, Vector{Int64}, WorkerConfig})()
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:501

    caused by: Unable to read host:port string from worker. Launch command exited with error?
    Stacktrace:
     [1] read_worker_host_port(io::Base.PipeEndpoint)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:330
     [2] connect(manager::Distributed.LocalManager, pid::Int64, config::WorkerConfig)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/managers.jl:575
     [3] create_worker(manager::Distributed.LocalManager, wconfig::WorkerConfig)
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:600
     [4] setup_launched_worker(manager::Distributed.LocalManager, wconfig::WorkerConfig, launched_q::Vector{Int64})
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:545
     [5] (::Distributed.var"#45#48"{Distributed.LocalManager, Vector{Int64}, WorkerConfig})()
       @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:501
Stacktrace:
  [1] sync_end(c::Channel{Any})
    @ Base ./task.jl:448
  [2] macro expansion
    @ ./task.jl:480 [inlined]
  [3] addprocs_locked(manager::Distributed.LocalManager; kwargs::@Kwargs{exeflags::Cmd})
    @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:490
  [4] addprocs_locked
    @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:456 [inlined]
  [5] addprocs(manager::Distributed.LocalManager; kwargs::@Kwargs{exeflags::Cmd})
    @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:450
  [6] addprocs
    @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:443 [inlined]
  [7] addprocs(np::Int32; restrict::Bool, kwargs::@Kwargs{exeflags::Cmd})
    @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/managers.jl:465
  [8] addprocs
    @ /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/managers.jl:462 [inlined]
  [9] process_opts(opts::Base.JLOptions)
    @ Distributed /opt/local/julia/julia-1.10.4/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1364
 [10] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [11] invokelatest
    @ ./essentials.jl:889 [inlined]
 [12] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:272
 [13] _start()
    @ Base ./client.jl:552
1 Like

Is this using an official build of Julia?

1 Like

Provide versioninfo() output.

Yes, this is an official version downloaded from the website.

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 40 × Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 40 virtual cores)
Environment:
  JULIA_CPU_TARGET = generic
  JULIA_CONDAPKG_BACKEND = Null
  JULIA_PYTHONCALL_EXE = /opt/local/mamba/envs/py310/bin/python
1 Like

I should add that I’ve tried this in different linux machines and sometimes the errors with libraries are different. All machines have RHEL 9.1, although they may have slightly different versions of some packages. Here are other examples of libraries failing to load when I start julia -p 4:

ERROR: Unable to load dependent library /opt/local/julia/julia-1.10.2/bin/../lib/julia/libjulia-codegen.so.1.10
Message:libLLVM-15jl.so: failed to map segment from shared object
ERROR: Unable to load dependent library /opt/local/julia/julia-1.10.2/bin/../lib/julia/libjulia-internal.so.1.10
Message:libunwind.so.8: failed to map segment from shared object

and another machine:

ERROR: Unable to load dependent library /opt/local/julia/julia-1.10.4/bin/../lib/julia/libstdc++.so.6
Message:/opt/local/julia/julia-1.10.4/bin/../lib/julia/libstdc++.so.6: failed to map segment from shared object
ERROR: Unable to load dependent library /opt/local/julia/julia-1.10.4/bin/../lib/julia/libjulia-internal.so.1.10
Message:/opt/local/julia/julia-1.10.4/bin/../lib/julia/libjulia-internal.so.1.10: failed to map segment from shared object

When I saw the libstdc++.so.6 errors I followed some advice from a Julia issue, where I linked instead libstdc++.so.6 from the system, and not the one that shipped with Julia. Unfortunately, that did not solve the issue, and often led to errors loading libjulia-internal.so.1.10.

Do you have a startup file? What happens when you run julia -p 4 --startup-file=no

No startup file, error is the same.

1 Like