Hi everyone,

I’ve been ending up having to compute a lot of large dense non-hermitian eigenproblems on an HPC cluster that uses AMD EPYC Milan CPUs. I thought I could use Intel MKL, since it usually does a better job for `zgeev`

(and presumably many other LAPACK routines) than OpenBLAS.

It’s known for a while that Intel is actively trying to slow down AMD CPUs when using their MKL library. First, there was the `MKL_DEBUG_CPU_TYPE`

flag, which then was removed by Intel in the later versions of MKL. Fortunately, there exists a workaround, preloading a fake library using `LD_PRELOAD`

as described here. Unfortunately, this trick does not work for Julia, as discussed in a related post.

I illustrate the potential performance difference:

We can trick MKL using Julia v1.6.7 build from source and `USE_INTEL_MKL=1`

. Then, using the hack with `LD_PRELOAD`

gives

```
julia> using LinearAlgebra
julia> BLAS.set_num_threads(8)
julia> n=200; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
0.835277 seconds (1.67 M allocations: 102.661 MiB, 1.57% gc time, 78.93% compilation time)
julia> n=2000; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
5.833620 seconds (26 allocations: 133.959 MiB, 0.40% gc time)
```

without the `LD_PRELOAD`

:

```
julia> using LinearAlgebra
julia> BLAS.set_num_threads(8)
julia> n=200; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
0.740193 seconds (1.67 M allocations: 102.664 MiB, 2.00% gc time, 90.19% compilation time)
julia> n=2000; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
10.117982 seconds (26 allocations: 133.959 MiB, 0.21% gc time)
```

And using Julia v1.9 with `MKL.jl`

(with or without `LD_PRELOAD`

):

```
julia> using MKL
julia> using LinearAlgebra
julia> BLAS.set_num_threads(8)
julia> n=200; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
1.192825 seconds (1.67 M allocations: 109.709 MiB, 6.84% gc time, 89.60% compilation time)
julia> n=2000; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
9.663940 seconds (26 allocations: 133.974 MiB, 0.08% gc time)
```

For comleteness Julia v1.9 with OpenBLAS:

```
julia> using LinearAlgebra
julia> BLAS.set_num_threads(8)
julia> n=200; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
1.312937 seconds (1.67 M allocations: 109.224 MiB, 8.13% gc time, 91.58% compilation time)
julia> n=2000; a = randn(ComplexF64,n,n);
julia> @time eigen(a);
8.914101 seconds (26 allocations: 126.161 MiB, 0.10% gc time)
```

Now, since I would like to avoid using Julia 1.6.7 and since `MKL_jll.jl`

exists, what can we do to make the best use of MKL on AMD CPUs? Is there a simple way of changing `MKL.jl`

or `MKL_jll.jl`

to make either the `LD_PRELOAD`

hack work, or is there a julia internal solution?

I hope I haven’t missed any discussions on this somewhere.