Hi,
I am trying to use ScaLAPACK with @ccall
to solve a linear system with QR decomposition. On my laptop it works fine with OpenMPI and directly searching for the default ScaLAPACK library:
const libscalapack = Base.Libc.Libdl.find_library("libscalapack")
Using Intel MKL (via OneAPI), however, doesn’t work here. It seemingly doesn’t find libmkl_scalapack_ilp64
but using dlopen
directly, reveals a dependency cascade between different libmkl_*
libraries (undefined symbol errors). Trying to load all needed libraries did finally let me load libmkl_scalapack_ilp64
but it ends in segmentation faults when actually calling routines.
Is it possible to use MPI.jl
and/or MKL.jl
to load the MKL system libraries (maybe with Base.Libc.Libdl.RTLD_GLOBAL
) so that loading ScaLAPACK later has its dependencies fulfilled?
I would to like use the system-installed libraries instead of the binarybuilt Julia artifacts for use on HPC clusters.
Thanks!
Hi Sascha,
Can I ask, what do you propose to use as the setup for loading libraries? Do you need to use MKL.jl as a dependency in your project?
Is one option to directly load the ScaLAPACK system library with the MKL system libraries and use @ccall
? If not, why not? (e.g. do you want to access other functions in MKL.jl ?)
Hi James, thanks for your reply!
I want to use ScaLAPACK to solve a distributed linear system. I tried to use find_library
to load the dynamic library. This works for the libscalapack
that is installed on my laptop.
I would also like to have the option to use the ScaLAPACK implementation from MKL (OneAPI). This does not work, as far as my testing went. You have to load at least 5 MKL libraries to satisfy the interdependencies. Even then the ccall
fails.
Summary: With libscalapack
it works, with libmkl_scalapack_ilp64
it fails. Maybe I’m loading it wrongly, maybe I’m calling it wrongly, maybe I’m configuring it wrongly. Either way it seems to be quite unintuitive. And so far I didn’t find anything about Julia and MPI/MKL/ScaLAPACK that helps in solving this problem.
I had hoped that MKL.jl would have triggers or something similar that could fulfill the interdependencies and maybe even in a way that works in contrast to how I tried it. Even then, I would like to use the system libraries instead of Julia artifacts since they would probably be more efficient on a cluster for internode-communication. And as far as I know, development is going in that direction, e.g. with MPItrampoline.
I also posted here in case somebody already knows what the underlying problem is and has a solution.
You might find this helpful for background:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.0nqhog
Using basic parameters, I get this linking line
-L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
which might reflect the libraries needed to build MKL for your usecase.
As you’ve found, MKL.jl is built against libiomp5
in IntelOpenMP_jll but not with mkl_scalapack_ilp64
.
But this is all not directly solving your case; you may need call into your properly linked HPC version of libmkl_rt
rather than libmkl_scalapack_ilp64
? Sorry I can’t be of much help.
Follow-up: I have tried this again.
It works fine on CentOS 8 (libscalapack
) and Ubuntu 20.04 LTS (libscalapack-openmpi
) by just using ccall(:func, libname), Cvoid, (…), …)
. In both cases only this library is needed.
With Intel OneAPI MKL it is more complicated since there are dependencies between the different libraries for modularity. I’m using Libdl.dlopen(libname, RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL)
to open:
libmpi
libmkl_rt
libmkl_blacs_intelmpi_lp64
libml_scalapack_lp64
This works to some extend. For example, blacs_pinfo
returns the correct MPI rank and size. But when I call blacs_get
to query the global BLACS context for later use, I receive a suspiciously large integer (1140850688) and subsequent calls to blacs_gridinit
and blacs_gridinfo
lead to grid values of -1
, which denotes an invalid context.
I also tried to replace libmkl_rt
with several other libraries but the result is the same: The calls work but the BLACS context is unusable.
Does anybody here have an idea what might be going wrong?