Error building 'MPI`, libmpi could not be found

Thanks. I did try both of those and I get that they both have the proper paths. So I guess julia does know what the path is.

I did try my maybe silly idea

export JULIA_MPI_PATH=$EBROOTOPENMPI:$LD_LIBRARY_PATH

Now I get that the path is full of good stuff, not optimal but I thought it worth a try, and I still get that find_library fails to confirm that libmpi is in the path.

julia> library
"/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib64/libmpi.so"

julia> path
"/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib64::/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/libfabric/1.10.1/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/ucx/1.8.0/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/gdrcopy/2.1/lib64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib/stubs:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/lib/intel64:/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/tbb/lib/intel64/gcc4.8:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/libevent/2.1.11/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/libfabric/1.10.1/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/ucx/1.8.0/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/gdrcopy/2.1/lib64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib/stubs:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/lib/intel64:/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/tbb/lib/intel64/gcc4.8"

julia> find_library(library, [path])
""

what is your LD_LIBRARY_PATH?

My LD_LIBRARY_PATH is copied below. I noticed there is a :: below, which can’t be good.

/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib64::/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/libevent/2.1.11/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/libfabric/1.10.1/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/ucx/1.8.0/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/cuda11.0/gdrcopy/2.1/lib64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib/stubs:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/lib/intel64:/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/tbb/lib/intel64/gcc4.8

Also, when I look at ldd of libmpi.so I find the following

$ ldd /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib64/libmpi.so
	linux-vdso.so.1 (0x00007ffc8e5a1000)
	libopen-rte.so.40 => /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib/libopen-rte.so.40 (0x00007f1504e64000)
	libopen-pal.so.40 => /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/CUDA/intel2020/cuda11.0/openmpi/4.0.3/lib/libopen-pal.so.40 (0x00007f1504bae000)
	libutil.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libutil.so.1 (0x00007f1504b75000)
	librt.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/librt.so.1 (0x00007f1504b6b000)
	libcudart.so.11.0 => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib64/libcudart.so.11.0 (0x00007f15048e9000)
	libiomp5.so => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/libiomp5.so (0x00007f15044df000)
	libpthread.so.0 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libpthread.so.0 (0x00007f15044bf000)
	libz.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libz.so.1 (0x00007f15044a5000)
	libhwloc.so.5 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libhwloc.so.5 (0x00007f150445f000)
	libevent-2.1.so.6 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libevent-2.1.so.6 (0x00007f150440a000)
	libevent_pthreads-2.1.so.6 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libevent_pthreads-2.1.so.6 (0x00007f1504405000)
	libimf.so => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/libimf.so (0x00007f1503d7c000)
	libsvml.so => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/libsvml.so (0x00007f15021f7000)
	libirng.so => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/libirng.so (0x00007f1501e8b000)
	libm.so.6 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libm.so.6 (0x00007f1501d48000)
	libgcc_s.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/libgcc_s.so.1 (0x00007f1501d2e000)
	libintlc.so.5 => /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/lib/intel64/libintlc.so.5 (0x00007f1501ab3000)
	libc.so.6 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libc.so.6 (0x00007f15018f9000)
	libdl.so.2 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libdl.so.2 (0x00007f15018f4000)
	/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/ld-linux-x86-64.so.2 (0x00007f15052ea000)
	libnuma.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libnuma.so.1 (0x00007f15018e6000)
	libudev.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libudev.so.1 (0x00007f15018be000)
	libpciaccess.so.0 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libpciaccess.so.0 (0x00007f15018b3000)
	libxml2.so.2 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libxml2.so.2 (0x00007f150174a000)
	libcrypto.so.1.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libcrypto.so.1.1 (0x00007f1501490000)
	libselinux.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libselinux.so.1 (0x00007f1501465000)
	libicuuc.so.65 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libicuuc.so.65 (0x00007f1501283000)
	liblzma.so.5 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/liblzma.so.5 (0x00007f150125b000)
	libpcre.so.1 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libpcre.so.1 (0x00007f15011e8000)
	libicudata.so.65 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libicudata.so.65 (0x00007f14ff735000)
	libstdc++.so.6 => /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/libstdc++.so.6 (0x00007f14ff4cc000)

What does

using Libdl
dlopen("libmpi")

give in Julia?

Sorry I cannot contribute here - and I really should be able to.
Just commenting that it looks like you are using EESSI which uses CERNVMFS and Eaybuild. Fantastic!

https://www.eessi-hpc.org/

Certainly. Copied below. I guess the good news is that it seems to have libmpi in the path in that I get exactly the same output as when I type out the exact location of libmpi.so

julia> dlopen("libmpi")
ERROR: could not load library "libmpi"
libevent-2.1.so.6: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
   @ Base.Libc.Libdl ./libdl.jl:114
 [2] dlopen (repeats 2 times)
   @ ./libdl.jl:114 [inlined]
 [3] top-level scope
   @ REPL[2]:1

when I try dlopen("libevent-2.1") I get a message saying cannto open shared object file. If I instead type in the whole path I get something a bit more interesting.

julia> dlopen("/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libevent-2.1.so.6")
ERROR: could not load library "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libevent-2.1.so.6"
libcrypto.so.1.1: cannot open shared object file: No such file or directory

When I try opening the libcrypto library I then get a problem with libc.

julia> dlopen("/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libcrypto.so.1.1")
ERROR: could not load library "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libcrypto.so.1.1"
/lib64/libc.so.6: version `GLIBC_2.25' not found (required by /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib64/libcrypto.so.1.1)

Finally when I try opening libc then I get a segmentation fault. I wonder if this might be the source of the problem?

julia> dlopen("/lib64/libc.so.6")

signal (11): Segmentation fault
in expression starting at REPL[8]:1
_dl_relocate_object at /lib64/ld-linux-x86-64.so.2 (unknown line)
dl_open_worker at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_open at /lib64/ld-linux-x86-64.so.2 (unknown line)
dlopen_doit at /lib64/libdl.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dlerror_run at /lib64/libdl.so.2 (unknown line)
dlopen at /lib64/libdl.so.2 (unknown line)
jl_load_dynamic_library at /buildworker/worker/package_linux64/build/src/dlload.c:257
#dlopen#3 at ./libdl.jl:114
dlopen at ./libdl.jl:114 [inlined]
dlopen at ./libdl.jl:114
jfptr_dlopen_52107.clone_1 at /home/fpoulin/software/julia-1.6.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:562
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:670
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:877
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:929
eval at ./boot.jl:360 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
#run_repl#42 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#874 at ./client.jl:387
jfptr_YY.874_41532.clone_1 at /home/fpoulin/software/julia-1.6.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_main_repl at ./client.jl:372
exec_options at ./client.jl:302
_start at ./client.jl:485
jfptr__start_34289.clone_1 at /home/fpoulin/software/julia-1.6.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:560
repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:702
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4007d8)
Allocations: 2651 (Pool: 2640; Big: 11); GC: 0
Segmentation fault

Thanks for sharing this, I did not know where easybuild comes from but it is certainly being used heavily on many big servers as part of compute canada.

Actually, there is a build in version of julia 1.6 on the server that I was using before. Unfortunately, when I tried to use Plots.jl it failed to produce an mp4 file. This doesn’t happen with the binary verison of julia 1.6, so I presume that’s a bug in the easybuild version. Do you think this is something I should mention somewhere and if yes, where exactly?

How did you install Julia?

I used the binaries.

@francispoulin I have had some talk with the Easybuild maintainer about Julia
I would advise mentioning this on Slack
https://easybuild.io/join-slack/

Easybuild exists to make the process of maintaining software on HPC systems - easy!
As you have seen there are several varieties of MPI , compilers, maths libraries etc. etc. on any HPC system. Easybuild have the concept of ‘toolchains’ such that applications can be built and maintained with given combinations of the basic tools - for example an Intel compiler version versus a gnu compiler version.
Also on HPC systems you will have software packages which are optimised for the particular CPU architecture you run on, not just the generic builds.

It looks like it is picking up an old libc. Perhaps try adding /cvmfs/soft.computecanada.ca/gentoo/2020/lib64 to your LD_LIBRARY_PATH?

@francispoulin I had a similar issue and I was wondering if you were able to build MPI?

Sorry for the late reply. I did try that and it didn’t help.

I am asking for support from people who maintain the machines and hope they can figure it out. I will let you know if/when we get it working.

Unfortunately, I have not had a successful build on the server. I’m asking for some technical support and if others manage to do it, I will certainly share waht I learn. Sorry that I could not be of more help.

Most of the time, messages like

fileX.so: cannot open shared object file: No such file or directory

are misleading, because it isn’t fileX.so that can’t be found, but some of its dependencies. All kernels are very bad because with this message they don’t tell you what can’t be found. On systems like Linux you have to use strace to find out what file is being searched but can’t be found. Nothing Julia can’t do about, operating systems are just unhelpful.

Thank you @giordano for the help. I have never used strace before. Could you point me to an example that might help me figure out what I need to do?

Start julia with

strace julia 2> strace.log

After you get the error message, you can see in the strace.log file what files are being searched. This file is going to be large, have fun.

You might also be able to use LD_DEBUG, which will output info from the linker. I think LD_DEBUG=libs julia should be sufficient?

1 Like

Thanks for your reply.

I have now figured it out and it worked.
I did not change any of other environmental variables but did the following

  1. go to a terminal
  2. type emacs ~/.julia/prefs/MPI.toml
  3. set path = “/usr/local/Cellar/open-mpi/4.1.1_2”

It seems that the way of setting the path is different from the ways in the two links in my original post (see Error building MPI---ERROR: LoadError: libmpi could not be found) which were in the files ~/.bash_profile or ~/.profile

Maybe it’s Julia version thing