My understanding is that all you need to do is type
using MKL
in the REPL and BLAS trampoline will take care of the rest. When I do that I see no difference in performance.
What should I be doing? I’m on an Intel Mac and MKL.jl installed with no complaints.
What platform are you using?
What does LinearAlgebra.BLAS.get_config()
say for you after you load MKL
?
julia> using LinearAlgebra.BLAS
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.so
julia> using MKL
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libmkl_rt.so
Looks like it’s doing the right thing. Seems like I need more compelling benchmarks.
Thanks,
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.0.3.13.dylib
julia> using MKL
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libmkl_rt.1.dylib
FWIW, on my Macbook Pro (Intel):
julia> using BenchmarkTools
julia> using LinearAlgebra
julia> BLAS.set_num_threads(1)
julia> A = rand(1000,1000); B = rand(1000,1000);
julia> @btime $A * $B;
53.150 ms (2 allocations: 7.63 MiB)
julia> @btime inv($A);
79.604 ms (5 allocations: 8.13 MiB)
julia> using MKL
julia> BLAS.set_num_threads(1)
julia> @btime $A * $B;
51.048 ms (2 allocations: 7.63 MiB)
julia> @btime inv($A);
62.249 ms (5 allocations: 9.35 MiB)
(UPDATED because I didn’t set the number of threads to one for MKL.)
Interesting. Your matrix-matrix product timings are like what I got with a different example. I did not try inv and am somewhat surprised at what you found. MKL may not be as amazing as I expected.
You’re setting num threads to 1 with OpenBLAS but not MKL?
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.so
julia> BLAS.set_num_threads(1)
julia> A = rand(1000,1000); B = rand(1000,1000);
julia> @btime $A * $B;
45.090 ms (2 allocations: 7.63 MiB)
julia> using MKL
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libmkl_rt.so
julia> @btime $A * $B;
14.268 ms (2 allocations: 7.63 MiB)
julia> BLAS.set_num_threads(1)
julia> @btime $A * $B;
45.732 ms (2 allocations: 7.63 MiB)
Indeed, my bad. Correct it above.
Is there any way I can use the output of BLAS.get_config() within a program to see if I’m using MKL or not?
Something along the lines of any(contains("mkl"), getfield.(BLAS.get_config().loaded_libs, :libname))
should do the job.
1 Like