Segmentation fault when using ccall to call a BLAS-like extension function from Intel MKL

NatSpring · December 3, 2024, 1:22am

I have written many Julia wrapper-functions that call routines from various parts of the Intel oneAPI library via the library’s .so file with ccall. This typically works without issue for BLAS routines, LAPACK routines and vector mathematical functions.

However, whenever I try to write a Julia wrapper-function for routines that are from the “BLAS-like Extensions” category of Intel MKL I get segmentation faults and Julia crashes.

Here is a minimal example showing how I call the mkl_dimatcopy routine from Intel oneAPI from Julia using the library file libmkl_rt.so:

# Path to MKL library file.
const libmklSo = "/opt/intel/oneapi/mkl/latest/lib/libmkl_rt.so"
@assert(isfile(libmklSo))

# Random 10 x 10 matrix to transpose.
AB = rand(10, 10)

# Verify that AB is 64-byte aligned which is a requirement of mkl_dimatcopy. 
@assert((UInt(pointer(AB)) % 64) == 0)

# Call the mkl_dimatcopy routine.
ccall((:mkl_dimatcopy, libmklSo), Cvoid,
    (
        Cchar,        # ordering ('C')
        Cchar,        # trans ('T')
        Csize_t,      # rows (10)
        Csize_t,      # cols (10)
        Cdouble,      # alpha (1.0)
        Ptr{Cdouble}, # AB
        Csize_t,      # lda (10)
        Csize_t,      # ldb (10)
    ),
    'C', 'T', 10, 10, 1.0, AB, 10, 10)

The mkl_dimatcopy function scales and optionally transposes a matrix in-place. But when I run the code above I instead get a segmentation fault and Julia crashes.

I have also tried to verify that the mkl_dimatcopy symbol is in the libmkl_rt.so file I am using, and it seems to be:

$ nm -D /opt/intel/oneapi/mkl/latest/lib/libmkl_rt.so | grep mkl_dimatcopy
00000000005ff510 T mkl_dimatcopy
00000000005ff510 T mkl_dimatcopy_
00000000005ff640 T mkl_dimatcopy_batch
00000000005ff640 T mkl_dimatcopy_batch_
00000000005ff770 T mkl_dimatcopy_batch_strided
00000000005ff770 T mkl_dimatcopy_batch_strided_

Can anyone with a greater ccall/Julia/MKL knowledge than me tell me why this is happening and how I can fix it?

I also asked this question on the Intel forum a few days ago, but I have not received any reply there and since the issue might be Julia/ccall related and not Intel related it might be better suited for this forum.

giordano · December 3, 2024, 1:43am

Does an equivalent C code reproduce the segmentation fault?

giordano · December 3, 2024, 2:01am

Judging by Re:mkl_dmatocopy segfaults and/or produces incorrect results - Intel Community the signature documented at https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2025-0/mkl-imatcopy.html is wrong and all arguments must be pointers. I can confirm the segfault with

julia> using MKL_jll

julia> AB = rand(10, 10);

julia> @ccall libmkl_rt.mkl_dimatcopy('C'::Cchar, 'T'::Cchar, size(AB, 1)::Csize_t, size(AB, 2)::Csize_t, AB::Ptr{Cdouble}, size(AB, 2)::Csize_t, size(AB, 1)::Csize_t)::Cvoid

[3675415] signal 11 (1): Segmentation fault
in expression starting at REPL[23]:1
mkl_dimatcopy at /home/mgiordano/.julia/artifacts/0ebc3c3e5d3f477a62b02a72a65fad4803ad9ed6/lib/libmkl_intel_lp64.so.2 (unknown line)

but adapting the signature as mentioned above (I probably got wrong the various size(AB, n) calls, adapt them as necessary, at least it’s not a problem for a square matrix):

julia> using MKL_jll

julia> function mkl_dimatcopy(AB)
           ordering = Ref{Cchar}('C')
           trans = Ref{Cchar}('T')
           rows = Ref{Csize_t}(size(AB, 1))
           cols = Ref{Csize_t}(size(AB, 2))
           alpha = Ref{Cdouble}(1.0)
           lda = Ref{Csize_t}(size(AB, 2))
           ldb = Ref{Csize_t}(size(AB, 1))
           @ccall libmkl_rt.mkl_dimatcopy(ordering::Ref{Cchar}, trans::Ref{Cchar}, rows::Ref{Csize_t}, cols::Ref{Csize_t}, alpha::Ref{Cdouble}, AB::Ptr{Cdouble}, lda::Ref{Csize_t}, ldb::Ref{Csize_t})::Cvoid
       end
mkl_dimatcopy (generic function with 1 method)

julia> AB = rand(4, 4)
4×4 Matrix{Float64}:
 0.861029  0.140103   0.697495   0.647796
 0.151458  0.0981793  0.698197   0.165531
 0.877205  0.373111   0.0916501  0.0779025
 0.380552  0.627333   0.801185   0.11526

julia> mkl_dimatcopy(AB)

julia> AB
4×4 Matrix{Float64}:
 0.861029  0.151458   0.877205   0.380552
 0.140103  0.0981793  0.373111   0.627333
 0.697495  0.698197   0.0916501  0.801185
 0.647796  0.165531   0.0779025  0.11526

Sounds like this entirely an MKL bug (at least in their documentation)

giordano · December 3, 2024, 2:33am

NatSpring:

$ nm -D /opt/intel/oneapi/mkl/latest/lib/libmkl_rt.so | grep mkl_dimatcopy
00000000005ff510 T mkl_dimatcopy
00000000005ff510 T mkl_dimatcopy_
00000000005ff640 T mkl_dimatcopy_batch
00000000005ff640 T mkl_dimatcopy_batch_
00000000005ff770 T mkl_dimatcopy_batch_strided
00000000005ff770 T mkl_dimatcopy_batch_strided_

In hindsight, the fact that both the C and the Fortran symbols point to the same address should have been a giveaway that all arguments must be pointers (only way that’d work in Fortran), despite what’s written in the documentation.

NatSpring · December 3, 2024, 6:16am

No it does not.

I tried the following:

Write a minimal .c program containing only this code:

#include <mkl.h>

void dimatcopyWrapper(const char ordering, const char trans, size_t rows, size_t cols, const double alpha, double* AB, size_t lda, size_t ldb) {
	mkl_dimatcopy(ordering, trans, rows, cols, alpha, AB, lda, ldb);
}

Compile the .c program into an .so file (debug.so).
Call my own debug.so file from Julia with ccall like this:

const mySoFile = "PATH_TO_debug.so"

# Random 10 x 10 matrix to transpose.
AB = rand(10, 10)

# If dimatcopy is successful the transposed matrix should equal this.
correctAnswer = collect(AB')

# Call mkl_dimatcopy to transpose the AB matrix using the .so file obtained from compiling my own .c program.
ccall((:dimatcopyWrapper, mySoFile), Cvoid,
    (
        Cchar,        # ordering ('C')
        Cchar,        # trans ('T')
        Csize_t,      # rows (10)
        Csize_t,      # cols (10)
        Cdouble,      # alpha (1.0)
        Ptr{Cdouble}, # AB
        Csize_t,      # lda (10)
        Csize_t       # ldb (10)
    ),
    'C', 'T', 10, 10, 1.0, AB, 10, 10 )

# Verify that mkl_dimatcopy produces the correct output.
@assert(all(AB .== correctAnswer))

For me this works without any issues. I do not have to do any Fortran/pointer reformatting, I just pass the variables directly with ccall as they are to my own function which in turn passes the variables to mkl_dimatcopy.

My own .c function (dimatcopyWrapper) has the exact same syntax as mkl_dimatcopy according to Intel’s documentation:

void mkl_dimatcopy (const char ordering, const char trans, size_t rows, size_t cols, const double alpha, double * AB, size_t lda, size_t ldb);

That is why I can not understand what is going on. Using ccall to call mkl_dimatcopy via the detour of my own wrapper function (dimatcopyWrapper) which does nothing extra at all works, but calling mkl_dimatcopy directly with ccall using libmkl_rt.so segfaults.

This issue is also not limited to mkl_dimatcopy, I have tried two routines from the “BLAS-like extensions” of MKL: mkl_dimatcopy and mkl_domatadd. Both of these cause a segfault when I ccall them from Julia, but routines that are part of regular BLAS (I have tried level 1, 2 and 3 routines), LAPACK, IPP etc. all work fine for me from Julia.

giordano · December 3, 2024, 8:16am

That’s not what I meant. I was suggesting to write a pure C program (no Julia involved at all) which does the same as your Julia code. I was going to try it myself, but before that I googled “mkl dimatcopy segfault” and I found the thread on the Intel forum where someone was having the exact same problem using a pure C/C++ code and the answer from an Intel engineer was to use pointers everywhere. Sounded to me like that settled it. I’m not going to speculate what’s going on a closed source proprietary library (although I’m sure someone may be able to guess something by looking at the assembly code).

giordano · December 3, 2024, 8:31am

NatSpring:

#include <mkl.h>

void dimatcopyWrapper(const char ordering, const char trans, size_t rows, size_t cols, const double alpha, double* AB, size_t lda, size_t ldb) {
	mkl_dimatcopy(ordering, trans, rows, cols, alpha, AB, lda, ldb);
}

Instead of compiling this code, try to only preprocess it, you’ll see that mkl_dimatcopy is turned into MKL_Dimatcopy, so at very least there are some macros that are doing some magics in the the MKL headers, but the point is that the proper signature of the mkl_dimatcopy symbol isn’t what’s officially documented. This is all what this discussion is about (did you try to read it?). As I said already, if that’s exactly the same as the Fortran function then it probably must take all pointer arguments.

Well, in the end I did, didn’t I?

NatSpring · December 3, 2024, 9:27am

I finally got the direct ccall into libmkl_rt.so to work now! You were correct, I needed to wrap every single input argument in Ref{}.

Thank you for sharing your wisdom, I have been struggling with this issue for days!

For future people who may encounter the same issue as I did, here is the revised ccall into libmkl_rt.so that works:

# Path to MKL library file.
const libmklSo = "/opt/intel/oneapi/mkl/latest/lib/libmkl_rt.so"

# Random 10 x 10 matrix to transpose.
AB = rand(10, 10)

# Input arguments as references.
ordering = Ref{Cchar}('C')
trans = Ref{Cchar}('T')
rows = Ref{Csize_t}(10)
cols = Ref{Csize_t}(10)
alpha = Ref{Cdouble}(1.0)
lda = Ref{Csize_t}(10)
ldb = Ref{Csize_t}(10)

# Call the mkl_dimatcopy routine with all arguments passed by reference.
ccall((:mkl_dimatcopy, libmklSo), Cvoid,
    (
        Ref{Cchar},   # ordering ('C')
        Ref{Cchar},   # trans ('T')
        Ref{Csize_t}, # rows (10)
        Ref{Csize_t}, # cols (10)
        Ref{Cdouble}, # alpha (1.0)
        Ptr{Cdouble}, # AB
        Ref{Csize_t}, # lda (10)
        Ref{Csize_t}, # ldb (10)
    ),
    ordering, trans, rows, cols, alpha, AB, lda, ldb)

Topic		Replies	Views
Calling Intel MKL as an external library on Mac General Usage ccall , mkl	2	1238	January 24, 2020
Problem when loading Intel MKL shared library to use with C written shared library General Usage ccall , mkl	9	2990	October 12, 2021
Using MKL, MKLSparse causes error New to Julia	1	148	April 4, 2024
Call Intel MKL as external library General Usage	6	4108	July 17, 2020
Segmentation fault on ccall General Usage ccall , bug , debug	6	1633	November 11, 2017

Segmentation fault when using ccall to call a BLAS-like extension function from Intel MKL

Related topics