Profiling the allocations showed that the `exponential!`

call was a significant source, so I played around with using `alloc_mem`

and trying various algorithms listed in the documentation. It turned out that the fastest algorithm for this application is `ExpMethodGeneric()`

which surprised me. Here is the latest version of the function:

```
using LinearAlgebra
using ExponentialUtilities
using ExponentialUtilities: alloc_mem
function compute3()
N1 = 1; N = 2*N1 #size of Hamiltonian matrix (H_) is 2*N by 2*N
#Hamiltonian matrix H
h0 = [-1 0; 0 1]
hp = [-1 0; 0 1]
hm = [-1 0; 0 1]
H0 = kron(Diagonal(ones(N1)), h0) + kron(diagm(1=>ones(N1-1)), hp) + kron(diagm(-1=>ones(N1-1)), hm)
H = [H0 zeros(ComplexF64, size(H0))
zeros(ComplexF64, size(H0)) H0]
#@show size(H)
for mn = 1:N
H[2*mn-1,2*mn] = H[2*mn,2*mn-1] = 1
end
H[(2*N1-1):2*N1, (2*N1+1):2*N1+2] .= [1 0; 0 -1]
H[(2*N1+1):2*N1+2, (2*N1-1):2*N1] .= [1 0; 0 -1]
#time
tmin = -80; tmax = 280;
Nt0 = 1500; tar0 = range(tmin,tmax,Nt0); deltat0 = tar0[2] - tar0[1]
#Green's function matrix
Glextt = zeros(ComplexF64, 2*N, 2*N, Nt0)
#Temporary green's function matrix, only used for the calculation for Glextt
Glesstt0 = zeros(ComplexF64,2*N,2*N)
##-----------------CALCULATION WHICH NEEDS TO BE SPED UP----------------------------
Uu1armax = zeros(ComplexF64, 2N, 2N, Nt0)
Siglex0max = zeros(ComplexF64, Nt0, Nt0)
#method = ExpMethodHigham2005() # 71.087441 seconds (3.40 M allocations: 2.133 GiB, 0.26% gc time)
#method = ExpMethodNative() # 72.297242 seconds (13.52 M allocations: 4.897 GiB, 0.64% gc time)
#method = ExpMethodDiagonalization() # 94.820985 seconds (36.26 M allocations: 14.452 GiB, 1.21% gc time, 0.99% compilation time)
method = ExpMethodGeneric() # 40.761433 seconds (6.78 M allocations: 3.188 GiB, 0.72% gc time)
#method = ExpMethodHigham2005Base() # 70.791615 seconds (7.90 M allocations: 2.518 GiB, 0.31% gc time)
cache = alloc_mem(H, method)
for gh = Nt0:-1:2
#println(gh)
Uu1ar = (@view Uu1armax[:, :, 1:gh]); Uu1ar[:, :, gh] .= I(2N)
f1 = exp(-0.5 * deltat0)
for hi = gh-1:-1:1
f2 = exponential!(-im * deltat0 * H, method, cache)
Uu1ar[:, :, hi] .= f1 * (@view Uu1ar[:, :, hi+1]) * f2
end
Gr = -im*Uu1ar
Gap = conj(permutedims(-im*Uu1ar, (3, 2, 1)))
Siglex0 = @view Siglex0max[1:gh, 1:gh]
for mn = 1:gh
Siglex0[mn, :] .= -1 ./ (tar0[mn] .- tar0[1:gh] .+ im*deltat0)
end
for pq = 1:N
Glesstt0[2*pq-1, 2*pq] = deltat0^2 * tr((@view Gr[2*pq-1, :, :]) * Siglex0 * @view Gap[:, :, 2*pq])
end
Glextt[:, :, gh] .= Glesstt0 #G<(t,t)
end
##----------------------------------------------------------------------------------
return Glextt
end
```

As shown in the comments above after the `method = `

lines, the time using `ExpMethodGeneric`

is reduced to about 41 seconds, much less than the other methods. I also verified that the final matrix `Glextt`

, when permuted, is nearly equal to that produced by the OP code to within about `5e-14`

. Perhaps @ChrisRackauckas can explain why this method is so much faster than the default `ExpMethodHigham2005()`

method.