Because I use MPI.jl
in my code, the compiled bundle will contain MPI in it, and I’ve always been using that MPI for consistence. However, recently this MPI fails. After some digging, I think the problem is in the 4.2.0+0 build of MPICH_jll
.
It is very easy to find this issue, if you have several artifact versions in your computer.
~/.julia/artifacts/69656c7b06da50dba7bdeef69f0ac06e478ac05a/bin/mpirun -h
works well, which corresponds to v4.1.2+0 build.
~/.julia/artifacts/0ed4137b58af5c5e3797cb0c400e60ed7c308bae/bin/mpirun -h
works well, which corresponds to v4.1.2+1 build.
But, ~/.julia/artifacts/cacaf4d3b5c79a7723468fc6b854afba34839634/bin/mpirun -h
cannot work, the error message is:
error while loading shared libraries: libhwloc.so.15: cannot open shared object file: No such file or directory
The reason, I believe, is that hwloc
is used statically in previous versions, but dynamically in the new v4.2.0+0 build. There are two proofs:
- the size of
mpiexec.hydra
(and some other binaries related to hydra) does not differ much in v4.1.2+0 and v4.1.2+1, but is much smaller in v4.2.0+0. I believe old versions contain the static library of hwloc within them, which makes the difference. - the v4.2.0+0 version introduces the dependency on
Hwloc_jll
The problem is, the artifacts system only works within Julia, and MPI is not a part of Julia. mpiexec.hydra
, even the one inside ~/.julia/artifacts
, surely won’t search the julia artifact path for shared objects it needs, unless we change the LD_LIBRARY_PATH
, which is inconvenient, especially when you are creating portable binaries.
It should be mentioned that, in the documentation of openmpi, it is warned that
“Regardless, it is critically important that if an MPI application — or any of its dependencies — uses Hwloc, it uses the same Hwloc with which Open MPI was compiled.”
So I cannot see the reason why hwloc is changed from static linking to dynamic linking. Any ideas?