Precompiled packages when $HOME is shared among different systems?

Hi,
I’m running julia on our clusters. At the moment, the user $HOMES are shared between 2 clusters, as they new one is phased in.

Now I’d like to run MPI applications in this configuration. For convenience, I’m using MPItrampoline: GitHub - eschnett/MPItrampoline: A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

I instantiate a small project and can run MPI programs on the old cluster:

rkube@nid00009:~/julia_envs/xgc_analysis_cori> cat Project.toml
name = "xgc_analysis_cori"
uuid = "b1d4e0b1-884e-40f9-9bb8-eab53f6d8b80"
authors = ["Ralph Kube <ralph_kube@gmx.net>"]
version = "0.1.0"

[deps]
ADIOS2 = "e0ce9d3b-0dbd-416f-8264-ccca772f60ec"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"

[extras]
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
rkube@nid00009:~/julia_envs/xgc_analysis_cori> ls
LocalPreferences.toml  Manifest.toml  Project.toml  src
rkube@nid00009:~/julia_envs/xgc_analysis_cori> cat src/01-mpi-hello.jl
# examples/01-hello.jl
using MPI
MPI.Init()

comm = MPI.COMM_WORLD
print("Hello world, I am rank $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))\n")

rkube@nid00009:~/julia_envs/xgc_analysis_cori> srun -n 4 $HOME/software/julia-1.8.1/bin/julia --project=. src/01-mpi-hello.jl 
Hello world, I am rank 0 of 4
Hello world, I am rank 3 of 4
Hello world, I am rank 1 of 4
Hello world, I am rank 2 of 4

When I do exactly the same on the new cluster, julia tries to access the MPI libraries installed in the old system and the program crashes:

rkube@nid005276:~/julia_envs/xgc_analysis_pm> srun -n 4 $HOME/software/julia-1.8.1/bin/julia --project=. src/01-mpi-hello.jl 
ERROR: LoadError: InitError: could not load library "/opt/cray/pe/mpt/7.7.19/gni/mpich-gnu/8.2/lib/libmpich"


Stacktrace:
/opt/cray/pe/mpt/7.7.19/gni/mpich-gnu/8.2/lib/libmpich.so: cannot open shared object file: No such file or directory
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl ./libdl.jl:117
  [2] dlopen
    @ ./libdl.jl:116 [inlined]
  [3] __init__()
    @ MPI ~/.julia/packages/MPI/08SPr/src/MPI.jl:66
  [4] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
    @ Base ./loading.jl:831
  [5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
    @ Base ./loading.jl:1039
  [6] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1315
  [7] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1200
  [8] macro expansion
    @ ./loading.jl:1180 [inlined]
  [9] macro expansion
    @ ./lock.jl:223 [inlined]
 [10] require(into::Module, mod::Symbol)
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base ./loading.jl:1144
during initialization of module MPI
in expression starting at /global/u2/r/rkube/julia_envs/xgc_analysis_pm/src/01-mpi-hello.jl:2

Does anyone have an idea how to separate julia installations for multiple systems?

Set JULIA_DEPOT_PATH to separate directories.

1 Like

But from what I can gather from the error message, that shouldn’t have much to do with the depot. What version of MPI.jl are you using?

1 Like

Hmm, perhaps one might want to use shared environments and have the environments in the depots. Then you can have different LocalPreferences.toml in each depot.

1 Like

Thanks for the help. I’m solving the problem by

  1. Setting JULIA_DEPOT_PATH in my .bashrc based on the cluster I’m logging in to:
case $LOGIN_HOST in
    "cluster1")
        : 
        export JULIA_DEPOT_PATH = "${HOME}/.julia_host1"
        ;;
    "cluster2")
        : 
        export JULIA_DEPOT_PATH = "${HOME}/.julia_host2"
        ;;
esac

Then I’m installing MPI.jl on each cluster using the system’s MPI library .

Then I can set up environments on each cluster as usual.