OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen)

Elrod · June 19, 2020, 1:03am

Thanks, added them.

Personally, I’ve been using Julia built with OpenBLAS because I got tired of issues such as ARPACK.jl not working. The _jlls are convenient for benchmark scripts though, because (a) they mean I don’t need to build Julia with MKL, and (b) someone else can run the script without me needing to make any assumptions or checks for what BLAS.vendor() returns.

I would consider using MKL_jll as a dependency in my own libraries though, because its performance (especially multi-threaded) is remarkable. However, as I assume it doesn’t work for ARM, that would force me to do a lot of special casing. And as Apple is moving to ARM, we’ll soon be seeing a lot more of it.

If you want to estimate flops:

M = K = N = 16_000;
A = rand(M, K); B = rand(K, N); C = Matrix{Float64}(undef, M, N);
time = @elapsed dgemmkl!(C, A, B);
2e-9M * K * N / time

Would yield GFLOPS.

Using your reported times with 1_000x1_000 matrices and 8 threads:

julia> M = K = N = 1000
1000

julia> 2e-9M * K * N / 7.374e-3
271.2232167073502

julia> 2e-9M * K * N / 9.142e-3
218.7705097352877

Versus 431, which you saw with your Julia 1.4 installed via apt. It should continue scaling as the matrices increase in size.

The theoretical peak of your CPU is:

julia> GHz = 4.25 # clock cycles per nanosecond
4.25

julia> ops_per_fma = 8 # AVX2 means 4 additions and 4 multiplications with double precision
8

julia> instr_per_clock = 2 # 2 fma per clock cycle
2

julia> cores = 8
8

julia> GHz * ops_per_fma * instr_per_clock * cores
544.0

BLAS should be able to get fairly close to the theoretical peak.

Also, this lets us conform that MKL does seem to be using AVX2 with your Ryzen:

julia> 2e-9M * K * N / 34.320e-3 # 1000x1000 flops
58.275058275058285

julia> 4.25 * 8 * 2 # theoretical peak for 1 core
68.0

Topic		Replies	Views
Current OpenBLAS Versions (January 2022) do not support Intel gen 11 performantly? Performance linearalgebra	50	4670	April 7, 2022
LU factorization performance issue New to Julia linearalgebra	30	720	June 6, 2022
Poor openBLAS performance for large matrix multiply? New to Julia openblas	17	1290	April 4, 2025
Show off Julia performance on your PC! Performance	53	4376	April 26, 2020
Any benchmark of Julia v1.0 vs older versions Performance	66	8190	April 3, 2019

OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen)

Related topics