How to utilize "MKLSparse.jl"?

I have some sparse matrices and I thought using “MKLSparse.jl” could faster the execution, but it did not. However, in the documentation, there is this note

The integer type that should be used in order for MKL to be called is the same as used by the Julia BLAS library, see Base.USE_BLAS64.
I am not sure if it is because of this issue. If so, how to set it given that I am using vs code?

julia> BLAS.lbt_get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] mkl_rt.2.dll

Execution of what exactly?

Works fine for a sparse-dense matrix multiplication for me:

julia> using BenchmarkTools

julia> using SparseArrays

julia> S = sprand(10_000, 10_000, 0.01);

julia> D = rand(10_000, 10_000);

julia> @btime $S * $D;
  10.275 s (2 allocations: 762.94 MiB)

julia> using MKLSparse

julia> @btime $S * $D;
  485.737 ms (2 allocations: 762.94 MiB)
  • I have similar to your results without MKLSparse, but your result is faster with it. Any idea?
  • Does my results below means MKLSparse works correctly in my case?
julia> @btime $S * $D;
  10.912 s (2 allocations: 762.94 MiB)

julia> using MKLSparse

julia> @btime $S * $D;
  2.834 s (2 allocations: 762.94 MiB)

I’d assume yes. But it’s hard to say given that you haven’t provided much information. For example, it would be relevant to know which CPU you’re running on. (My CPU has probably more threads etc. than yours)

Thanks for your reply. Please find my CPU details below.

Intel(R) Core™ i7-10750H CPU @ 2.60GHz 2.59 GHz

julia> Sys.cpu_info()
12-element Vector{Base.Sys.CPUinfo}:
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz: 
        speed         user         nice          sys         idle          irq
     2592 MHz    6221562            0      9454234    218662796      3367234  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz: 
        speed         user         nice          sys         idle          irq
     2592 MHz    6820937            0      5096546    222420734       148750  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz   10452000            0      5743921    218142296       112703  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz    9861718            0      4013812    220462687        82734  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz   12091000            0      4026343    218220875        37468  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz   13246093            0      4133781    216958343        38250  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz   15318750            0      4460359    214559125        34562  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz   15330390            0      4278515    214729312        39500  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz    7359000            0      7165921    219813296       102062  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz    6830562            0      4275578    223232078        99515  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz    7758718            0      4911546    221667953        89359  ticks
 Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2592 MHz    7871406            0      4968968    221497843       172296  ticks
julia> Threads.nthreads()
6

I changed threads to 12, but still giving me the same performance.

julia> Threads.nthreads()
12

Is there another issue to check the reason of the slow performance?