Speed of exp function for skew hermitian matrices

I am currently investigating the use of Julia instead of MATLAB for some large quantum spin-dynamics simulations. As for most QM simulations this requires a large number of matrix exponentials. However I am getting a considerably slower speed (5x) than when using MATLAB for 64x64 matrices.

Here is some of my current testing code for a smaller example (8x8) which is still much slower than MATLAB but the difference increase massively with size.

The Julia code looks like this:

a = [[0.5  0.0  0.0  0.0   0.5   0.0   0.0   0.0];
    [0.0  0.5  0.0  0.0   0.0   0.5   0.0   0.0];
    [0.0  0.0  0.5  0.0   0.0   0.0   0.5   0.0];
    [0.0  0.0  0.0  0.5   0.0   0.0   0.0   0.5];
    [0.5  0.0  0.0  0.0  -0.5   0.0   0.0   0.0];
    [0.0  0.5  0.0  0.0   0.0  -0.5   0.0   0.0];
    [0.0  0.0  0.5  0.0   0.0   0.0  -0.5   0.0];
    [0.0  0.0  0.0  0.5   0.0   0.0   0.0  -0.5]]

@benchmark exp(-1im*pi*a)

This returns:

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   89.900 μs …  3.002 ms  ┊ GC (min … max): 0.00% … 93.60%
 Time  (median):      99.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   102.584 μs ± 44.338 μs  ┊ GC (mean ± σ):  0.55% ±  1.33%

    ▁▁▃▇█▆▆▇▇▆▅▄▃▃▃▃▂▂▁▁▁▁▁▁ ▁                                 ▂
  ▅█████████████████████████████▇██▇▇▇▇▇▆▇▆▆▆▆▅▆▄▄▅▅▅▅▅▅▃▄▆▁▅▅ █
  89.9 μs       Histogram: log(frequency) by time       153 μs <

 Memory estimate: 14.34 KiB, allocs estimate: 16.

Whilst the MATLAB code is:

testfun = @() expm(-1i *pi * a)
timeit(testfun)

Which returns times of around 20-25 μs

This might seems stupid to be fighting over 75μs however when expanded to 64*64 the discrepancy becomes 1.7ms (JULIA) vs 350 μs (MATLAB). This becomes a major issue when you consider the 10^9 times it is calculated in my full simulation, and represents a nearly all of the computational time.

Any help with this would be appreciated as I would far prefer to code in Julia as it is a much nicer experience however, computational speed is more important.

Have you tried using MKL (which MATLAB is using by default)? I see a massive speedup, although it is so large that I don’t trust it:

julia> @btime exp(-1im*pi*a)
  167.400 μs (16 allocations: 14.34 KiB)

julia> using MKL

julia> @btime exp(-1im*pi*$a);
  7.600 μs (16 allocations: 14.34 KiB)

(I have checked however that the result is the same for OpenBLAS and MKL, and the timing difference remains when switching back and forth between backends)

2 Likes

Have you considered using a specialized algorithm? I haven’t looked into it, but implementing something like [2103.10132] An efficient algorithm to compute the exponential of skew-Hermitian matrices for the time integration of the Schrödinger equation might help (and will 100% be easier in Julia than in MATLAB)

2 Likes

This is awesome and does seem to lead to a massive speed improvement. When testing my 64*64 matrices these now compute in 350μs which is pretty much identical to MATLAB.

Thank you so much. Now it will be a bit harder for my colleagues to dismiss Julia straight off the bat. :laughing:

Thanks for this tip. I will have a look into to it to see if it can lead to a further speed improvement.

You seem to interpolate the a only in the second calculation, that seems like it could cause a discrepancy?

Glad that worked - if it’s identical to MATLAB then it is very likely that it’s really just a (in this case surprisingly large) difference between OpenBLAS and MKL. I don’t know anything about exponentiating skew hermitian matrices, but if it’s just a simple BLAS call in the end there’s generally no performance difference between Julia and other languages which ultimately call the same BLAS routine (like MATLAB or Python/numpy)

No that’s just a copy paste error - I did try both initially to see whether there was an issue with OP not interpolating, but it doesn’t matter here:

julia> @btime exp(-1im*pi*$a);
  166.400 μs (16 allocations: 14.34 KiB)

julia> @btime exp(-1im*pi*a);
  173.200 μs (16 allocations: 14.34 KiB)

julia> using MKL

julia> @btime exp(-1im*pi*$a);
  7.550 μs (16 allocations: 14.34 KiB)

julia> @btime exp(-1im*pi*a);
  7.650 μs (16 allocations: 14.34 KiB)
1 Like

This was why I was so surprised at first. I wasn’t switching to Julia for speed, but more the programming experience. MATLAB’s lack of environments and dark mode, means that I will forever avoid it when I can. Sadly most of my research group work in MATLAB so that’s where lots of projects starts.

Thanks again for the help!

As a general rule, Julia shouldn’t be any slower than MATLAB (when using MKL) for almost all tasks, and often can be faster. There are a lot of lapsed MATLAB users on the forum (including myself!) who take offense to claims that MATLAB outperforms Julia, which in general means you are likely to get good responses to any posts of the form “Why is MATLAB so much faster than Julia in doing X?”

1 Like

It’s like tricking designers to do free work by using comic sans (which is a trap that catches me all the time)

1 Like

Julia for skew-hermitian matrices uses a general algorithm (apparently a Padé approximant) for non-normal matrices (in the julia 1.7 distribution, look in stdlib/v1.7/LinearAlgebra/src/dense.jl line 613), which is more stable in the general case, but slower than a specialized one for hermitian matrices, which just calls an eigendecomposition (symmetric.jl line 755), which as far as I know is the best way to compute the exponential of a (dense, general) normal matrix. You can use @which exp(a) to see which function is called by a command. So here I would just eigendecompose a and compute the exponential from there. In my system, with OpenBLAS, it’s 6x faster.

Julia could plausibly do this automatically by checking for skew-hermiticity in exp! and dispatch to that. Or dispatch on cis(A::Hermitian).

2 Likes

Thanks for this. I had noticed this myself when investigating yesterday, however MATLAB also doesn’t check for skew-hermitian. So I decided to first solve the inherent speed issue and then potentially further optimize the algorithm.

Also thanks for the tip about @which, I hadn’t come across this