With matrix commutator [A,B] \equiv AB-BA, consider the sum of \sum_{i=1}^k [MF_i, F_{k-i+1}], where M and F_{j} are known matrices. I’m using something like this that allocate in advance:
using LinearAlgebra
using BenchmarkTools
function comu!(r, a, b) # r = a*b-b*a, allocated r
idy = one(eltype(r))
mul!(r, a, b)
mul!(r, b, a, -idy, idy)
end
function cm_sum!(res_k, k, f, mx, tmpM, mcmu) # add sum to res_k
for ik = 1:k
mul!(tmpM, mx, f[ik])
comu!(mcmu, tmpM, f[k-ik+1])
res_k .-= mcmu
end
end
ndim = 50
k = 60
r = zeros(Float64, ndim, ndim)
mx = rand(ndim, ndim)
t1 = similar(r)
t2 = similar(r)
fs = [rand(ndim, ndim) for j = 1:k]
@btime cm_sum!($r, $k, $fs, $mx, $t1, $t2)
834.737 μs (0 allocations: 0 bytes)
Other than @inbounds
and threading/parallelization, are there other julia-specific tricks or more efficient algorithms for this sum?
usage case: mx and fs in real use is not random but given/calculated; this res_k is one term in a big differential equation system where k also loops over 1 through some cutoff N. Since differential equation also loops over a time domain 0 through t_final, and the diff. eqn. is iterated over thousands of initial values, this inner-most loop is quite important performance-wise.
Thanks!