Hi,
I’m running julia on our clusters. At the moment, the user $HOMES are shared between 2 clusters, as they new one is phased in.
Now I’d like to run MPI applications in this configuration. For convenience, I’m using MPItrampoline: GitHub - eschnett/MPItrampoline: A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI
I instantiate a small project and can run MPI programs on the old cluster:
rkube@nid00009:~/julia_envs/xgc_analysis_cori> cat Project.toml
name = "xgc_analysis_cori"
uuid = "b1d4e0b1-884e-40f9-9bb8-eab53f6d8b80"
authors = ["Ralph Kube <ralph_kube@gmx.net>"]
version = "0.1.0"
[deps]
ADIOS2 = "e0ce9d3b-0dbd-416f-8264-ccca772f60ec"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
[extras]
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
rkube@nid00009:~/julia_envs/xgc_analysis_cori> ls
LocalPreferences.toml Manifest.toml Project.toml src
rkube@nid00009:~/julia_envs/xgc_analysis_cori> cat src/01-mpi-hello.jl
# examples/01-hello.jl
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
print("Hello world, I am rank $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))\n")
rkube@nid00009:~/julia_envs/xgc_analysis_cori> srun -n 4 $HOME/software/julia-1.8.1/bin/julia --project=. src/01-mpi-hello.jl
Hello world, I am rank 0 of 4
Hello world, I am rank 3 of 4
Hello world, I am rank 1 of 4
Hello world, I am rank 2 of 4
When I do exactly the same on the new cluster, julia tries to access the MPI libraries installed in the old system and the program crashes:
rkube@nid005276:~/julia_envs/xgc_analysis_pm> srun -n 4 $HOME/software/julia-1.8.1/bin/julia --project=. src/01-mpi-hello.jl
ERROR: LoadError: InitError: could not load library "/opt/cray/pe/mpt/7.7.19/gni/mpich-gnu/8.2/lib/libmpich"
Stacktrace:
/opt/cray/pe/mpt/7.7.19/gni/mpich-gnu/8.2/lib/libmpich.so: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(s::String, flags::UInt32; throw_error::Bool)
@ Base.Libc.Libdl ./libdl.jl:117
[2] dlopen
@ ./libdl.jl:116 [inlined]
[3] __init__()
@ MPI ~/.julia/packages/MPI/08SPr/src/MPI.jl:66
[4] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
@ Base ./loading.jl:831
[5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
@ Base ./loading.jl:1039
[6] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1315
[7] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[8] macro expansion
@ ./loading.jl:1180 [inlined]
[9] macro expansion
@ ./lock.jl:223 [inlined]
[10] require(into::Module, mod::Symbol)
[1] dlopen(s::String, flags::UInt32; throw_error::Bool)
@ Base ./loading.jl:1144
during initialization of module MPI
in expression starting at /global/u2/r/rkube/julia_envs/xgc_analysis_pm/src/01-mpi-hello.jl:2
Does anyone have an idea how to separate julia installations for multiple systems?