ANN: MKLSparse



Hello everyone,

I recently updated the MKLSparse.jl package for 0.5 and 0.6.

The most useful feature of MKLSparse is likely the ability to seamlessly accelerate sparse matrix vector multiplications (which are the main workhorse in iterative solvers). Using a representative matrix for benchmarking I get the following timings

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
  2.901099 seconds (18.45 k allocations: 994.534 KiB)

julia> using MKLSparse

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
  0.877888 seconds (31.31 k allocations: 1.641 MiB)

where we can see that performance is greatly increased by just loading MKLSparse (results will vary depending on the system this is run on).

A bonus with the new version of MKLSparse is that there is no longer a need to build Julia with MKL to use it. Instead, it is enough to have MKL installed and the paths correctly set for the package to work.

While the DSS (Direct Sparse Solver) interface is not yet wrapped, the package Pardiso.jl can instead be used to solve general sparse systems using MKL.

// Kristoffer


Hi, thanks for MKLSparse. Is it normal if I see only 3% speed gains? I am in i7-6700 laptop. The total lf allocations and memory used in the the same order of magnitude as in your case but the final wall times are almost the same :confused:



Will it work with JuliaPro MKL edition out of the box?

Does the JuliaPro MKL Edition use MKL for Sparse Matrices to begin with?


Perhaps MKL fails to be used at all; are the tests passing? Does the CPU usage indicate that multiple cores are used? What if you try larger matrices than in my first post?

It should, but I haven’t tried it.

I don’t think so, no.


It would be great if it worked with JuliaPro MKL Edition out of the box by utilizing the MKL packaged by Julia.
Same holds for PARDISO.jl.

By the way, thank you for both!


Hello Kristoffer, following your questions, I checked some stuff.

  • All tests are passing
  • CPU indicates that the “julia” process is taking 499% CPU usage.

I repeated the tests, but now I used the function for the “representative matrix” as linked in the post, but using the parameter 100, instead of 60 in getDivGrad(n,n,n).

Situation is now worse, using MKL makes it slower. from ~9.5 to ~11 seconds, using multiple cores.

 11.141514 seconds (24.98 k allocations: 1.296 MiB)

I am using version 0.6.2

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

I have been using Pardiso.jl with MKL with very good results in this same computer, which makes it even more strange this aforementioned results.

I have also a Julia version installed that was compiled with MKL and the situation is very similar.

I also tried with and without “# export JULIA_NUM_THREADS=4” at my .bashrc file, having no differences.

Any hint would be quite appreciated :slight_smile:



Not sure, just tried on my mac (i7-4770HQ) and I get (running the timings twice)

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
  2.716284 seconds

julia> using MKLSparse

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
  1.115627 seconds

For large matrices the speedup is smaller but still significant:

julia> K = getDivGrad(100,100,100); b = rand(size(K,1)); c = similar(b);

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
 12.858318 seconds

julia> using MKLSparse

julia> @time for i in 1:1000 A_mul_B!(c,K,b) end;
  8.882692 seconds