How to load ScaLAPACK library from Intel MKL (library inter-dependencies)?

,

Hi,

I am trying to use ScaLAPACK with @ccall to solve a linear system with QR decomposition. On my laptop it works fine with OpenMPI and directly searching for the default ScaLAPACK library:
const libscalapack = Base.Libc.Libdl.find_library("libscalapack")

Using Intel MKL (via OneAPI), however, doesn’t work here. It seemingly doesn’t find libmkl_scalapack_ilp64 but using dlopen directly, reveals a dependency cascade between different libmkl_* libraries (undefined symbol errors). Trying to load all needed libraries did finally let me load libmkl_scalapack_ilp64 but it ends in segmentation faults when actually calling routines.

Is it possible to use MPI.jl and/or MKL.jl to load the MKL system libraries (maybe with Base.Libc.Libdl.RTLD_GLOBAL) so that loading ScaLAPACK later has its dependencies fulfilled?

I would to like use the system-installed libraries instead of the binarybuilt Julia artifacts for use on HPC clusters.

Thanks!

Hi Sascha,

Can I ask, what do you propose to use as the setup for loading libraries? Do you need to use MKL.jl as a dependency in your project?
Is one option to directly load the ScaLAPACK system library with the MKL system libraries and use @ccall? If not, why not? (e.g. do you want to access other functions in MKL.jl ?)

Hi James, thanks for your reply!

I want to use ScaLAPACK to solve a distributed linear system. I tried to use find_library to load the dynamic library. This works for the libscalapack that is installed on my laptop.

I would also like to have the option to use the ScaLAPACK implementation from MKL (OneAPI). This does not work, as far as my testing went. You have to load at least 5 MKL libraries to satisfy the interdependencies. Even then the ccall fails.

Summary: With libscalapack it works, with libmkl_scalapack_ilp64 it fails. Maybe I’m loading it wrongly, maybe I’m calling it wrongly, maybe I’m configuring it wrongly. Either way it seems to be quite unintuitive. And so far I didn’t find anything about Julia and MPI/MKL/ScaLAPACK that helps in solving this problem.

I had hoped that MKL.jl would have triggers or something similar that could fulfill the interdependencies and maybe even in a way that works in contrast to how I tried it. Even then, I would like to use the system libraries instead of Julia artifacts since they would probably be more efficient on a cluster for internode-communication. And as far as I know, development is going in that direction, e.g. with MPItrampoline.

I also posted here in case somebody already knows what the underlying problem is and has a solution.

You might find this helpful for background:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.0nqhog
Using basic parameters, I get this linking line

 -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

which might reflect the libraries needed to build MKL for your usecase.

As you’ve found, MKL.jl is built against libiomp5 in IntelOpenMP_jll but not with mkl_scalapack_ilp64.

But this is all not directly solving your case; you may need call into your properly linked HPC version of libmkl_rt rather than libmkl_scalapack_ilp64? Sorry I can’t be of much help.

Follow-up: I have tried this again.

It works fine on CentOS 8 (libscalapack) and Ubuntu 20.04 LTS (libscalapack-openmpi) by just using ccall(:func, libname), Cvoid, (…), …). In both cases only this library is needed.

With Intel OneAPI MKL it is more complicated since there are dependencies between the different libraries for modularity. I’m using Libdl.dlopen(libname, RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL) to open:

  1. libmpi
  2. libmkl_rt
  3. libmkl_blacs_intelmpi_lp64
  4. libml_scalapack_lp64

This works to some extend. For example, blacs_pinfo returns the correct MPI rank and size. But when I call blacs_get to query the global BLACS context for later use, I receive a suspiciously large integer (1140850688) and subsequent calls to blacs_gridinit and blacs_gridinfo lead to grid values of -1, which denotes an invalid context.

I also tried to replace libmkl_rt with several other libraries but the result is the same: The calls work but the BLACS context is unusable.

Does anybody here have an idea what might be going wrong?