# Updating lazy kronecker products

I need to calculate something of the form:  is an arbitrary scalar function.

The following code creates all the variations of Kronecker products that I need (transpose, Hermitian transpose and complex conjugation). It also allows me to update the products when I update .

using LazyArrays, BenchmarkTools

function kroneckerproducts(A)
I = oneunit(A)

# (I ⊗ A) and its transpose, Hermitian transpose and complex conjugate.
IA = Kron(I,A)
IAT, IAH, IAC = [transpose(IA), IA', transpose(IA')]

# (A ⊗ I)...
AI = Kron(A,I)
ATI, AHI, ACI = [transpose(AI), AI', transpose(AI')]

return [IA, AI, IAT, ATI, IAH, AHI, IAC, ACI]
end

function updatekroneckerproducts!(Karray,Klazy,B)
# Should be enough to update (I ⊗ A) -> (I ⊗ B)
Klazy.arrays .= B
copyto!.(Karray,Klazy)
end

N = 8
A = rand(ComplexF64,N,N)
K = kroneckerproducts(A)
Karr = Array.(K)

Anew = rand(ComplexF64,N,N)

@btime updatekroneckerproducts!($Karr,$K,$Anew) # 3.066 ms (8 allocations: 320 bytes)  These allocations don’t really seem to affect performance, but I am wondering if I can get rid of them when performing the sum. Any suggestions? 1 Like When you write this, you’re allocating a three-element array of arrays just to assign it into local variables. Just delete the square brackets to use a tuple and avoid that. Similarly, when you’re returning all those arrays, you probably want to return a tuple instead of an actual array. 2 Likes @StefanKarpinski That allocating call is not measured in the @btime. I’m surprised this allocates, I wonder if its a “fake” allocation just caused by timing: I sometimes see small allocations like this that disappear in more complicated code, perhaps caused by inlining. Edit: Sorry, just realised what’s going on: the allocation is in the copyto! broadcasting and so yes, using a tuple should fix it. 1 Like Things like “8 allocations” are often just an artifact of the function needing to allocate to return objects, which goes away when the caller is not a global scope. 2 Likes Using tuples improves the situation, thank you! Now @btime shows 2.671 ms (3 allocations: 64 bytes) (by removing square brackets in the constructor and letting the update-function return nothing). The allocations still stack up when performing the sum: using LinearAlgebra: Diagonal, mul! function kroneckermultiply!(Karray, Klazy, a, b, total, aDb, work, Amatrices, Dmatrices) total .= 0.0 for (A,D) in zip(Amatrices,Dmatrices) updatekroneckerproducts!(Karray,Klazy,A) # 3 allocations: 64 bytes, each iteration a .= Karr .- Karr # (I ⊗ A) - (A^T ⊗ I) b .= Karr # (I ⊗ A^H) mul!(work, a, D) mul!(aDb,work,b) total .+= aDb .* rand() end return total end function bench(N=8,M=10) Amatrices = eval.(rand(ComplexF64,N,N) for i = 1:M) Dmatrices = Array.(Diagonal(rand(ComplexF64,N^2,N^2)) for i in 1:M) K = kroneckerproducts(A) Karr = Array.(K) a, b, total, work, aDb = (Matrix{ComplexF64}(undef,N^2,N^2) for i in 1:5) @btime kroneckermultiply!($Karr,$K,$a,$b,$total,$aDb,$work,$Amatrices,$Dmatrices);
end

bench() # 33.987 ms (30 allocations: 640 bytes)