Trying out the new 5-argument mul!
introduced in Julia 1.3, (ABα+Cβ → C), for the calculation of matrix commutator comu(A,B)=AB-BA. From my testing, with small (2x2 or 3x3) matrices, mul!
is actually faster than gemm!
, and when such functions are used in the inner-most/hot loop, it can lead to very noticeable performance differences:
using BenchmarkTools
using LinearAlgebra
ndim = 3;
m1=rand(ComplexF64,ndim,ndim);
m2=rand(ComplexF64,ndim,ndim);
ou=rand(ComplexF64,ndim,ndim);
function comu!(r, a, b)
idy = one(eltype(r))
mul!(r, a, b)
mul!(r, b, a, -idy, idy)
return nothing
end
function comu_b!(r, a, b)
idy = one(eltype(r))
mul!(r, a, b)
BLAS.gemm!('N', 'N', -idy, b, a, idy, r)
return nothing
end
@btime comu!($ou,$m1,$m2)
88.698 ns (0 allocations: 0 bytes)
@btime comu_b!($ou,$m1,$m2)
200.584 ns (0 allocations: 0 bytes)
Is there something special that’s going on here? As another comparison, using just 3-argument mul!
tmp = similar(ou);
function comu_mm!(r,a,b,h)
mul!(r,a,b)
mul!(h,b,a)
r .-= h
return nothing
end
@btime comu_mm!($ou,$m1,$m2,$tmp)
82.248 ns (0 allocations: 0 bytes)
Thanks!