How to mitigate bad performance of eigvals from LinearAlgebra

Dear all,

I am trying to rewrite my Matlab code that calculates certain band structure to Julia. Unfortunately, I am unable to get close to the performance of Matlab code for some reason that I don’t understand. I will be very thankful if you help me.

My Julia code is the following:

#assert Threads.nthreads() == 8
using BenchmarkTools
using FFTW
using MKL
using LinearAlgebra
LinearAlgebra.BLAS.set_num_threads(1);
FFTW.set_provider!("mkl")

function Hamiltonian(nk, a, v)
    
    nx = length(v);
    E = zeros(ComplexF64, nk);
    K = 2*π*collect(fftshift(fftfreq(nx, nx)))
    Hf = zeros(ComplexF64, nx, nx);
    k = collect(range(-π, π, nk));

    fft!(v)
    fftshift!(v, v/length(v))
    
    for j=1:nx, i=1:nx
        idx = i - j + floor(Int64, nx/2 + 1)
        if 0 < idx <= nx
            @inbounds Hf[i, j] = v[idx];
        end
    end
    Threads.@threads for i in eachindex(k)
        e = eigvals(Hermitian(Diagonal(-0.5.*abs.(k[i] .+ K).^a) .+ Hf));
        sort!(e, lt = (x, y) -> abs(real(x)) < abs(real(y)));
        @inbounds E[i] = e[1];
    end

    return E
    
end

x = range(0, 1, 1001);
@btime H = Hamiltonian(100, 1.5, complex(100*cos.(2*pi*x).*sinh.(-x)));

Julia version is 1.10. I am using MKL explicitly because Matlab uses MKL on intel processor and it does give better performance on my system. I have turned off auto parallelism of MKL because parallelism is already achieved by Threads.@threads on outer for loop, and again it gives better performance. I have also tried @distrbuted and got roughly the same results. This code is around 50% slower than its Matlab 2023a counter part:

function e = Hamiltonian(nk, a, p)

p = fftshift(fft(p))/length(p);
k = linspace(-pi, pi, nk);
L = length(p);
m = fix(L/2);
M = -m:m;
K = 2*pi*M;
Hf = zeros(L, L);

[i, j] = meshgrid(1:L, 1:L);
mask = round(i - j + floor(L/2));
Hf(mask >= 0 & mask < L) = p(mask(mask >= 0 & mask < L)+1);

e = zeros(1, nk);
parfor i=1:nk
    H = diag(-0.5*abs(k(i) + K).^a) + Hf;
    E = eig(H, 'vector');
    [~, m] = sort(abs(real(E)));
    E = E(m);
    e(i) = real(E(1));
end

tic
x = linspace(0, 1, 1001);
e = Hamiltonian(100, 1.5, 100*cos(2*pi*x).*sinh(-x));
toc

The performance of these two are roughly the same for smaller matrices <500x500 size. But as matrix matrix becomes larger, I see huger gap. In general the potential term p,v (third argument of Hamiltonian) is Complex, so I use fft not rfft. Also I know performance wise the whole function is irrelevant, only the part eigenvalues are calculated causes bad performance. Moreover these two functions are completely equivalent (i.e eig solver in both cases sees the same matrix and produce same results).

1 Like

Hmm, quite confusing! Virtually all the time is spent in diagonalization, so perhaps this could be an MKL versioning issue?

Could you check if the problem persists with a more minimal example such as the following:


function FakeHamiltonian(Nx,Nk)
    ks = 1:Nk
    E = zeros(Float64,Nk)
    Threads.@threads for i in eachindex(ks)
        H = rand(ComplexF64,Nx,Nx)
        e = eigvals(Hermitian(H))
        sort!(e, by=abs)
        E[i] = e[1];
    end
    return E
end

Nx = 1001
Nk = 32
@benchmark ParallelFakeHamiltonian($Nx,$Nk)

No idea if this would affect timings, but a BenchmarkTools practice is to interpolate untyped globals like 2*pi*$x and sinh.(-$x) into the benchmark loop made by the macro, in order to avoid propagating type inference problems that a normal call wouldn’t.

so perhaps this could be an MKL versioning issue?

Right on the mark! Even though it was a fresh installation of Julia 1.10 along its packages, for some baffling reason, MKL version was 0.3.0. (Currently it is 0.6.1) and couldn’t be updated, until I removed everything and installed them again. Who would’ve thought that an old version of MKL would make such huge performance gap!

As for minimal example, I couldn’t use random numbers, because in that case the random numbers seen by Matlab would be different from Julia, and eig solvers would’ve not been exactly comparable. At any rate, thanks.

5 Likes