I have to perform the exponential of a matrix. I have compared the built-in exp function to the exponential! function from the ExponentialUtilites package. The latter is supposed to be much faster than the built-in function. However, I don’t see this!

A = randn(2^12, 2^12)
@time exp(im*A);
71.589389 seconds (2.02 M allocations: 1.880 GiB, 0.10% gc time, 2.65% compilation time)
A = randn(2^12, 2^12)
@time exponential!(im*A);
71.524002 seconds (3.93 M allocations: 1.741 GiB, 0.25% gc time, 5.79% compilation time)

Why is this so? And is there any other way to calculate the exponential of a large matrix efficiently, like using multi-threading?

I’m not an expert, but… if we just think about a taylor expansion:

x + x^2 + x^3 + x^4 times some coefficients.

This sum can be split into an accumulator where the final result will be stored and some chunk of memory were the calculation is being done. The calculation is you just keep multiplying by the same matrix and can be done iteratively: each step is multiplying some NxN matrix corresponding to the output of the previous step, times x which is also an NxN matrix. This has known size and can be allocated beforehand.

Alternatively, isn’t this the kind of thing that BLAS should be really good at doing? Why not just use that?

Hi, @stevengj! Yes, the matrix is Hermitian. And yes, you are right; I just need to multiply exp(im*A) by a vector. Could you please tell me what other possibilities you are referring to? Thanks!

Hi @tom-plaa! I am not sure how much accuracy I will lose by doing a series truncation. But I would prefer to calculate exp(im*A) since it’s important to preserve the unitarity.

Check the accuracy of the result. In my experience ExponentialUtilities is faster than KrylovKit with default settings. However KrylovKit tries very hard to give you the most accurate result (by using restart IIRC) while ExponentialUtilities just gives you less accurate results.

@Oscar_Smith How to reduce the number of allocations for expv? So in my work, I have to repeatedly apply the exp(im*H), where H is Hermitian, on a state, which gets updated in each run. Because of large memory allocations, I end up getting much slower computation with expv, than with exponential!. Here is a minimal working code for the latter

function run_sim(t_final, U, initial_state, final_state)
for t in 1:t_final
final_state .= U * initial_state
end
final_state
end
function main(n)
d = 2^n
initial_state = rand(ComplexF64, d)
final_state = initial_state
t_final = 10
A = rand(ComplexF64, (d, d))
H = (A .+ A') / 2
U = exponential!(im*H)
for _ in 1:10
final_state .= run_sim(t_final, U, initial_state, final_state)
end
end
@time main(10);
1.326368 seconds (122 allocations: 145.607 MiB)

Here is the same code using expv

function run_sim(t_final, H, initial_state, final_state)
for t in 1:t_final
final_state .= expv(im, H, initial_state)
end
final_state
end
function main(n)
d = 2^n
initial_state = rand(ComplexF64, d)
final_state = initial_state
t_final = 10
A = rand(ComplexF64, (d, d))
H = (A .+ A') / 2
for _ in 1:10
final_state .= run_sim(t_final, H, initial_state, final_state)
end
end
@time main(10);
5.373086 seconds (4.91 k allocations: 101.364 MiB)

You can reuse some memory by using caches but unfortunately the docs are broken for that. Here is how some of it works:

state = rand(ComplexF64, d)
krylovdim = 30 # default
# 1. create a Krylov subspace cache
KS = ExponentialUtilities.KrylovSubspace{ComplexF64}(length(state), min(krylovdim, length(state)))
# 2. populate Krylov space
ExponentialUtilities.arnoldi!(KS, H, state; ishermitian=true)
# 3. compute exponential, stores result back in state
ExponentialUtilities.expv!(state, -im*Δt, KS)
# now repeat 2. and 3.

I think this still allocates some memory but you cannot really work around it because you have complex valued states. But I am not quite sure about this.