Call Intel MKL as external library


#1

I wrote a script to test a call to Intel MKL a external library but it doesn’t work.
The concept: Solve a linear system using MKL “dgetrs” (https://software.intel.com/en-us/node/520892)

# Test MKL solve linear systems

N = 10;
A = rand(N,N);
AA = copy(A);
b = rand(N,1);
c = copy(b);

function juliaSol(A,b)
 \(A,b);
end
sol = sol = juliaSol(A,b);  #Solution of the linear system

# Test MKL
const global librt = Libdl.find_library(["libmkl_rt"], ["/opt/intel/mkl/lib"])

# Open librt
Libdl.dlopen(librt)

# 101  = LAPACK_ROW_MAJOR
# 102 = LAPACK_COL_MAJOR
function linSOLVE(A::StridedMatrix{Float64}, b::StridedMatrix{Float64})
  m, n = size(A)
  lda  = max(1,stride(A, 2))
  ipiv = similar(A, Int64, min(m,n))

  #lapack_int LAPACKE_dgetrs (int matrix_layout , char trans , lapack_int n ,
  # lapack_int nrhs , const double * a , lapack_int lda ,
  # const lapack_int * ipiv , double * b , lapack_int ldb );
  dd = ccall(("LAPACKE_dgetrs", librt),
              Cint, # Return type
                (Int64, Cuchar, Int64,
                Int64,  Ptr{Float64}, Int64,
                Ptr{Int64},  Ptr{Float64}, Int64),
                102, 'N', m,
                1, A, lda,
                ipiv, b, m
                )
  dd, b, ipiv # dd = 0 if calculation is done, b = output
end

dd, sol2, piv = linSOLVE(AA,c)

The script run with an error: Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP
and the solution of the linear system is completely wrong.

An example of ccal of a BLAS can be seen here: https://github.com/JuliaLang/julia/blob/master/base/linalg/lapack.jl#L528

There is any expert (or developer) that can help me to use correctly the MKL ccall ?


#2

Do you have MKL set to use 64 bit integers (ILP64)?


#3

No, i haven’t set anything…

But i loaded the wrong version of the script… and now it work!
Here the working version:

# Test MKL solve linear systems

N = 10000;
A = rand(N,N);
AA = copy(A);
b = rand(N,1);
c = copy(b);

function juliaSol(A,b)
 \(A,b);
end
@time sol = juliaSol(A,b);

# Test MKL
const global librt = Libdl.find_library(["libmkl_rt"], ["/opt/intel/mkl/lib"])

# Open librt
Libdl.dlopen(librt)
# LU FACORIZATION
function luFactMKL(A::StridedMatrix{Float64})
  m, n = size(A)
  lda  = max(1,stride(A, 2))
  ipiv = similar(A, Int64, min(m,n))

# lapack_int LAPACKE_dgetrf (int matrix_layout , lapack_int m , lapack_int n ,
# double * a , lapack_int lda , lapack_int * ipiv );
  dd = ccall(("LAPACKE_dgetrf",  librt), Cint,
            (Int64, Int64, Int64,
            Ptr{Float64}, Int64, Ptr{Int64}),
            102, m, n,
            A, lda, ipiv
  )
  A, ipiv
end

@time AA,pivot = luFactMKL(AA)

# 101  = LAPACK_ROW_MAJOR
# 102 = LAPACK_COL_MAJOR
function linSOLVE(A::StridedMatrix{Float64}, b::StridedMatrix{Float64}, ipiv::StridedVector{Int64})
  m, n = size(A)
  lda  = max(1,stride(A, 2))
  #lapack_int LAPACKE_dgetrs (int matrix_layout , char trans , lapack_int n ,
  # lapack_int nrhs , const double * a , lapack_int lda ,
  # const lapack_int * ipiv , double * b , lapack_int ldb );
  dd = ccall(("LAPACKE_dgetrs", librt),
              Cint, # Return type
                (Int64, Cuchar, Int64,
                Int64,  Ptr{Float64}, Int64,
                Ptr{Int64},  Ptr{Float64}, Int64),
                102, 'N', m,
                1, A, lda,
                ipiv, b, m
                )
  dd, b, ipiv # dd = 0 if calculation is done, b = output
end

@time dd, sol2, piv = linSOLVE(AA,c, pivot)

Using Intel MKL dgetrf + dgtrs julia is as fast as Matlab ! :wink:

On my PC: with A = rand(10000,10000);

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: NT (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

Mkl "dgetrf" Factorization time 
 12.180575 seconds (4.16 k allocations: 261.296 KB)

JULIA LAPACK factorization time 
 38.910369 seconds (10 allocations: 78.469 KB)

LAPACK is very memory efficient… but MKL is 3 times faster (and use 3 times more memory)


Linear solver \(A, B) performance vs Matlab A\b
#4

Interesting. Is this with threading on in both OpenBLAS and MKL?

Also, is it possible to use this mechanism to replace dynamically all LAPACK calls from OpenBLAS to MKL?


#5

I’ve setted anything on both OpenBLAS or MKL. OpenBLAS is as well as it is after a standard installation of Julia, MKL is as well as it is after the MKL intallation on the OS.

OpenBLAS:
ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
4 # Thread

MKL: MKL automatically tune the number of threads

If you want use MKL for every BLAS function call, the more convenient way is build julia with MKL. In this way every BLAS function will be handled by MKL.