Fast Exponential of a large matrix

I have to perform the exponential of a matrix. I have compared the built-in exp function to the exponential! function from the ExponentialUtilites package. The latter is supposed to be much faster than the built-in function. However, I don’t see this!

A = randn(2^12, 2^12)
@time exp(im*A);

 71.589389 seconds (2.02 M allocations: 1.880 GiB, 0.10% gc time, 2.65% compilation time)


A = randn(2^12, 2^12)
@time exponential!(im*A);

71.524002 seconds (3.93 M allocations: 1.741 GiB, 0.25% gc time, 5.79% compilation time)

Why is this so? And is there any other way to calculate the exponential of a large matrix efficiently, like using multi-threading?

1 Like

If you’re willing to sacrifice accuracy maybe an idea could be to truncate its series expansion: Matrix exponential - Wikipedia

Wait, can’t this be done without allocating?

I’m not an expert, but… if we just think about a taylor expansion:

x + x^2 + x^3 + x^4 times some coefficients.

This sum can be split into an accumulator where the final result will be stored and some chunk of memory were the calculation is being done. The calculation is you just keep multiplying by the same matrix and can be done iteratively: each step is multiplying some NxN matrix corresponding to the output of the previous step, times x which is also an NxN matrix. This has known size and can be allocated beforehand.

Alternatively, isn’t this the kind of thing that BLAS should be really good at doing? Why not just use that?

If you have a Hermitian A you should use cis(A) instead.

Do you need the whole matrix, or do you just need to multiply it by a vector? In the latter case there are faster possibilities.

3 Likes

Hi, @stevengj! Yes, the matrix is Hermitian. And yes, you are right; I just need to multiply exp(im*A) by a vector. Could you please tell me what other possibilities you are referring to? Thanks!

Hi @tom-plaa! I am not sure how much accuracy I will lose by doing a series truncation. But I would prefer to calculate exp(im*A) since it’s important to preserve the unitarity.

expv which uses krylov methods is likely much better here.

4 Likes

@Oscar_Smith this is wonderful! Thanks!

It’s well known that Taylor series are a terrible way to compute matrix exponentials in general: Taking gradients of a matrix exponential - #8 by stevengj

3 Likes

Unfortunately, it looks like ExponentialUtilities.jl can’t currently exploit this: cisv(t, A, b) to compute exp(im*A*t)*b for Hermitian A? · Issue #176 · SciML/ExponentialUtilities.jl · GitHub

It might be worth trying KrylovKit.jl instead, since it supports this case in its exponentiate function, by passing an imaginary t and a Hermitian A.

3 Likes

@stevengj it seems expv in ExponentialUtilities is much faster than exponentiate in KrylovKit

@time expv(im, B, phi);
0.491426 seconds (46 allocations: 1.065 MiB)

@time exponentiate(B, im, phi);
11.420562 seconds (973 allocations: 26.749 MiB, 0.10% gc time)

Check the accuracy of the result. In my experience ExponentialUtilities is faster than KrylovKit with default settings. However KrylovKit tries very hard to give you the most accurate result (by using restart IIRC) while ExponentialUtilities just gives you less accurate results.

@Oscar_Smith How to reduce the number of allocations for expv? So in my work, I have to repeatedly apply the exp(im*H), where H is Hermitian, on a state, which gets updated in each run. Because of large memory allocations, I end up getting much slower computation with expv, than with exponential!. Here is a minimal working code for the latter

function run_sim(t_final, U, initial_state, final_state)
    for t in 1:t_final
        final_state .= U * initial_state
    end
    final_state
end

function main(n)

    d = 2^n
    initial_state = rand(ComplexF64, d)
    final_state = initial_state

    t_final = 10
    
    A = rand(ComplexF64, (d, d))
    H = (A .+ A') / 2
    
    U = exponential!(im*H)

    for _ in 1:10
        final_state .= run_sim(t_final, U, initial_state, final_state)
    end


end

@time main(10);
  1.326368 seconds (122 allocations: 145.607 MiB)

Here is the same code using expv

function run_sim(t_final, H, initial_state, final_state)
    for t in 1:t_final
        final_state .= expv(im, H, initial_state)
    end
    final_state
end

function main(n)

    d = 2^n
    initial_state = rand(ComplexF64, d)
    final_state = initial_state

    t_final = 10
    
    A = rand(ComplexF64, (d, d))
    H = (A .+ A') / 2

    for _ in 1:10
        final_state .= run_sim(t_final, H, initial_state, final_state)
    end


end

@time main(10);
  5.373086 seconds (4.91 k allocations: 101.364 MiB)

You can reuse some memory by using caches but unfortunately the docs are broken for that. Here is how some of it works:

state = rand(ComplexF64, d)
krylovdim = 30 # default
# 1. create a Krylov subspace cache
KS = ExponentialUtilities.KrylovSubspace{ComplexF64}(length(state), min(krylovdim, length(state)))
# 2. populate Krylov space
ExponentialUtilities.arnoldi!(KS, H, state; ishermitian=true)
# 3. compute exponential, stores result back in state
ExponentialUtilities.expv!(state, -im*Δt, KS)
# now repeat 2. and 3.

I think this still allocates some memory but you cannot really work around it because you have complex valued states. But I am not quite sure about this.

1 Like

It does reduce the allocations but is still large.

  5.003996 seconds (1.51 k allocations: 58.343 MiB)