I just build Julia 1.0.1 on a cluster and linked it against MKL (2019). As a simple benchmark, I compared the performance of squaring a 1000 x 1000
matrix against the Julia binaries from the website (OpenBLAS). I remember I did this for 0.6.4 at some point and found MKL to be faster by 30% or so. However, I’m blown away by the difference I found this time:
MKL:
julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8* (2018-09-29 19:05 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libimf
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
julia> using LinearAlgebra; LinearAlgebra.versioninfo()
BLAS: libmkl_rt
LAPACK: libmkl_rt
julia> using BenchmarkTools
julia> A = rand(1000,1000);
julia> @btime $A*$A;
1.926 ms (2 allocations: 7.63 MiB)
OpenBLAS:
julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
julia> using LinearAlgebra; LinearAlgebra.versioninfo()
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=16)
LAPACK: libopenblas64_
julia> using BenchmarkTools
julia> A = rand(1000,1000);
julia> @btime $A*$A;
7.905 ms (2 allocations: 7.63 MiB)
julia> @btime $A*$A;
7.124 ms (2 allocations: 7.63 MiB)