How to circumvent Intel's AMD discrimination in MKL from v1.7 onwards?

fgerick · October 17, 2023, 2:41pm

Hi everyone,
I’ve been ending up having to compute a lot of large dense non-hermitian eigenproblems on an HPC cluster that uses AMD EPYC Milan CPUs. I thought I could use Intel MKL, since it usually does a better job for zgeev (and presumably many other LAPACK routines) than OpenBLAS.

It’s known for a while that Intel is actively trying to slow down AMD CPUs when using their MKL library. First, there was the MKL_DEBUG_CPU_TYPE flag, which then was removed by Intel in the later versions of MKL. Fortunately, there exists a workaround, preloading a fake library using LD_PRELOAD as described here. Unfortunately, this trick does not work for Julia, as discussed in a related post.

I illustrate the potential performance difference:

We can trick MKL using Julia v1.6.7 build from source and USE_INTEL_MKL=1. Then, using the hack with LD_PRELOAD gives

julia> using LinearAlgebra

julia> BLAS.set_num_threads(8)

julia> n=200; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  0.835277 seconds (1.67 M allocations: 102.661 MiB, 1.57% gc time, 78.93% compilation time)

julia> n=2000; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  5.833620 seconds (26 allocations: 133.959 MiB, 0.40% gc time)

without the LD_PRELOAD:

julia> using LinearAlgebra

julia> BLAS.set_num_threads(8)

julia> n=200; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  0.740193 seconds (1.67 M allocations: 102.664 MiB, 2.00% gc time, 90.19% compilation time)

julia> n=2000; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
 10.117982 seconds (26 allocations: 133.959 MiB, 0.21% gc time)

And using Julia v1.9 with MKL.jl (with or without LD_PRELOAD):

julia> using MKL

julia> using LinearAlgebra

julia> BLAS.set_num_threads(8)

julia> n=200; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  1.192825 seconds (1.67 M allocations: 109.709 MiB, 6.84% gc time, 89.60% compilation time)

julia> n=2000; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  9.663940 seconds (26 allocations: 133.974 MiB, 0.08% gc time)

For comleteness Julia v1.9 with OpenBLAS:

julia> using LinearAlgebra

julia> BLAS.set_num_threads(8)

julia> n=200; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  1.312937 seconds (1.67 M allocations: 109.224 MiB, 8.13% gc time, 91.58% compilation time)

julia> n=2000; a = randn(ComplexF64,n,n);

julia> @time eigen(a);
  8.914101 seconds (26 allocations: 126.161 MiB, 0.10% gc time)

Now, since I would like to avoid using Julia 1.6.7 and since MKL_jll.jl exists, what can we do to make the best use of MKL on AMD CPUs? Is there a simple way of changing MKL.jl or MKL_jll.jl to make either the LD_PRELOAD hack work, or is there a julia internal solution?

I hope I haven’t missed any discussions on this somewhere.

fgerick · October 19, 2023, 1:54pm

Okay I actually found a hacky solution myself for now:

using MKL
mklpath = dirname(MKL.MKL_jll.libmkl_rt_path)

cd(mklpath)

rm("libmkl_rt.so")
rm("libmkl_core.so")

write("libamdmkl.c","int mkl_serv_intel_cpu_true() {return 1;}")
run(`gcc -shared -o libmkl_core.so -Wl,-rpath=''\$ORIGIN'' libamdmkl.c libmkl_core.so.2`)
run(`gcc -shared -o libmkl_rt.so -Wl,-rpath=''\$ORIGIN'' libamdmkl.c libmkl_rt.so.2`)
rm("libamdmkl.c")

The original libmkl_rt.so and libmkl_core.so are just softlinks to libmkl_rt.so.2 and libmkl_core.so.2, so we don’t mess up things too much (for those that are worried).
I guess there are many reasons not to do this or cases where it fails, but it works for my system and I guess many similar ones as well!

EDIT: actually this doesn’t work properly. It doesn’t link properly to the libmkl_rt.so.2 library, but I had another MKL version in the library path, so then this hack works. I don’t know how to properly link the library though without relative paths.
I have updated the code to make it work for relative paths. Maybe someone will come up with something more elegant!

fgerick · May 16, 2025, 3:55pm

Make sure to

import Pkg; Pkg.pin(name="MKL_jll", version="2024.1.0")

since it seems like Intel caught up on this solution to AMD MKL performance.

Topic		Replies	Views
Linking to MKL 2019 with AMD CPUs? Performance question	17	1831	December 9, 2021
Hack: AMD Ryzen/TR/Epyc + Intel Math Kernel Library (MKL) Performance	0	1461	November 18, 2019
Yet another failed attempt to build julia with MKL General Usage	6	3401	December 25, 2017
Is MKL performance on AMD no longer crippled? General Usage	4	2079	May 11, 2024
Issues building Julia 0.7 with Intel MKL on macOS Internals & Design build	13	2749	June 13, 2018

How to circumvent Intel's AMD discrimination in MKL from v1.7 onwards?

Related topics