I am very new to Julia and trying to convert my python codes to Julia to speed up my work. However, the present Julia version, pasted below, still runs slower than my python version.

I see two bottlenecks:

(1) the matrix exponential calculation, for which I have tried two methods. I have commented out the relatively slower one, fastExpm.

(2) the various matrix multiplications.

Are there any glaring inefficiencies in my code which can significantly speed up the process? I am surprised that this takes longer than my python code. I have set a small size for the matrix at the moment, N1=14, which creates the Hamiltonian of size 4x14=56 by 56. Ideally, this should work for N1~100.

```
using LinearAlgebra
using FastExpm
using ExponentialUtilities
N1 = 1; N = 2*N1; #size of Hamiltonian matrix (H_) is 2*N by 2*N
#Hamiltonian matrix H
h0 = [-1 0; 0 1];
hp = [-1 0; 0 1];
hm = [-1 0; 0 1];
H0 = kron(Diagonal(ones(N1,N1)), h0) + kron(diagm(1=>ones(N1-1)), hp) + kron(diagm(-1=>ones(N1-1)), hm);
H = [H0 zeros(Complex{Float64}, size(H0))
zeros(Complex{Float64}, size(H0)) H0];
for mn = 1:N
H[2*mn-1,2*mn] = 1; H[2*mn,2*mn-1] = 1;
end
H[(2*N1-1):2*N1, (2*N1+1):2*N1+2] = [1 0; 0 -1];
H[(2*N1+1):2*N1+2, (2*N1-1):2*N1] = [1 0; 0 -1];
#time
tmin = -80; tmax = 280;
Nt0 = 1500; tar0 = range(tmin,tmax,Nt0); deltat0 = tar0[2] - tar0[1];
#Green's function matrix
Glextt = zeros(Complex{Float64}, Nt0,2*N,2*N);
#Temporary green's function matrix, only used for the calculation for Glextt
Glesstt0 = zeros(Complex{Float64},2*N,2*N);
##-----------------CALCULATION WHICH NEEDS TO BE SPED UP----------------------------
for gh = Nt0:-1:2
println(gh)
Uu1ar = zeros(Complex{Float64},gh,2*N,2*N); Uu1ar[gh,:,:] = Diagonal(ones(2*N));
for hi = gh-1:-1:1
#option 1 https://github.com/fmentink/FastExpm.jl
#global Uu1ar[hi,:,:] = exp(-0.5*Gamma*deltat0)*Uu1ar[hi+1,:,:]*fastExpm(-im*H*deltat0);
#option 2 https://github.com/SciML/ExponentialUtilities.jl
global Uu1ar[hi,:,:] = exp(-0.5*deltat0)*Uu1ar[hi+1,:,:]*exponential!(-im*H*deltat0);
end
Gr = -im*Uu1ar;
Gap = conj(permutedims(-im*Uu1ar, (1, 3, 2)));
Siglex0 = zeros(Complex{Float64},gh,gh);
for mn = 1:gh
global Siglex0[mn,:] = -1 ./ (tar0[mn] .- tar0[1:gh] .+ im*deltat0);
end
for pq = 1:N
global Glesstt0[2*pq-1,2*pq] = deltat0^2*tr( transpose(Gr[:,2*pq-1,:])*Siglex0*Gap[:,:,2*pq] );
end
global Glextt[gh,:,:] = Glesstt0; #G<(t,t)
end
##----------------------------------------------------------------------------------
```