Profiling the allocations showed that the exponential!
call was a significant source, so I played around with using alloc_mem
and trying various algorithms listed in the documentation. It turned out that the fastest algorithm for this application is ExpMethodGeneric()
which surprised me. Here is the latest version of the function:
using LinearAlgebra
using ExponentialUtilities
using ExponentialUtilities: alloc_mem
function compute3()
N1 = 1; N = 2*N1 #size of Hamiltonian matrix (H_) is 2*N by 2*N
#Hamiltonian matrix H
h0 = [-1 0; 0 1]
hp = [-1 0; 0 1]
hm = [-1 0; 0 1]
H0 = kron(Diagonal(ones(N1)), h0) + kron(diagm(1=>ones(N1-1)), hp) + kron(diagm(-1=>ones(N1-1)), hm)
H = [H0 zeros(ComplexF64, size(H0))
zeros(ComplexF64, size(H0)) H0]
#@show size(H)
for mn = 1:N
H[2*mn-1,2*mn] = H[2*mn,2*mn-1] = 1
end
H[(2*N1-1):2*N1, (2*N1+1):2*N1+2] .= [1 0; 0 -1]
H[(2*N1+1):2*N1+2, (2*N1-1):2*N1] .= [1 0; 0 -1]
#time
tmin = -80; tmax = 280;
Nt0 = 1500; tar0 = range(tmin,tmax,Nt0); deltat0 = tar0[2] - tar0[1]
#Green's function matrix
Glextt = zeros(ComplexF64, 2*N, 2*N, Nt0)
#Temporary green's function matrix, only used for the calculation for Glextt
Glesstt0 = zeros(ComplexF64,2*N,2*N)
##-----------------CALCULATION WHICH NEEDS TO BE SPED UP----------------------------
Uu1armax = zeros(ComplexF64, 2N, 2N, Nt0)
Siglex0max = zeros(ComplexF64, Nt0, Nt0)
#method = ExpMethodHigham2005() # 71.087441 seconds (3.40 M allocations: 2.133 GiB, 0.26% gc time)
#method = ExpMethodNative() # 72.297242 seconds (13.52 M allocations: 4.897 GiB, 0.64% gc time)
#method = ExpMethodDiagonalization() # 94.820985 seconds (36.26 M allocations: 14.452 GiB, 1.21% gc time, 0.99% compilation time)
method = ExpMethodGeneric() # 40.761433 seconds (6.78 M allocations: 3.188 GiB, 0.72% gc time)
#method = ExpMethodHigham2005Base() # 70.791615 seconds (7.90 M allocations: 2.518 GiB, 0.31% gc time)
cache = alloc_mem(H, method)
for gh = Nt0:-1:2
#println(gh)
Uu1ar = (@view Uu1armax[:, :, 1:gh]); Uu1ar[:, :, gh] .= I(2N)
f1 = exp(-0.5 * deltat0)
for hi = gh-1:-1:1
f2 = exponential!(-im * deltat0 * H, method, cache)
Uu1ar[:, :, hi] .= f1 * (@view Uu1ar[:, :, hi+1]) * f2
end
Gr = -im*Uu1ar
Gap = conj(permutedims(-im*Uu1ar, (3, 2, 1)))
Siglex0 = @view Siglex0max[1:gh, 1:gh]
for mn = 1:gh
Siglex0[mn, :] .= -1 ./ (tar0[mn] .- tar0[1:gh] .+ im*deltat0)
end
for pq = 1:N
Glesstt0[2*pq-1, 2*pq] = deltat0^2 * tr((@view Gr[2*pq-1, :, :]) * Siglex0 * @view Gap[:, :, 2*pq])
end
Glextt[:, :, gh] .= Glesstt0 #G<(t,t)
end
##----------------------------------------------------------------------------------
return Glextt
end
As shown in the comments above after the method =
lines, the time using ExpMethodGeneric
is reduced to about 41 seconds, much less than the other methods. I also verified that the final matrix Glextt
, when permuted, is nearly equal to that produced by the OP code to within about 5e-14
. Perhaps @ChrisRackauckas can explain why this method is so much faster than the default ExpMethodHigham2005()
method.