Updating lazy kronecker products

I need to calculate something of the form:

where are dense matrices, are diagonal, is the identity matrix and

is an arbitrary scalar function.

The following code creates all the variations of Kronecker products that I need (transpose, Hermitian transpose and complex conjugation). It also allows me to update the products when I update .

using LazyArrays, BenchmarkTools


function kroneckerproducts(A)
    I = oneunit(A)

    # (I ⊗ A) and its transpose, Hermitian transpose and complex conjugate.
    IA = Kron(I,A)
    IAT, IAH, IAC = [transpose(IA), IA', transpose(IA')]

    # (A ⊗ I)...
    AI = Kron(A,I)
    ATI, AHI, ACI = [transpose(AI), AI', transpose(AI')]

    return [IA, AI, IAT, ATI, IAH, AHI, IAC, ACI]
end

function updatekroneckerproducts!(Karray,Klazy,B)
    # Should be enough to update (I ⊗ A) -> (I ⊗ B)
    Klazy[1].arrays[2] .= B
    copyto!.(Karray,Klazy)
end

N = 8
A = rand(ComplexF64,N,N)
K = kroneckerproducts(A)
Karr = Array.(K)

Anew = rand(ComplexF64,N,N)

@btime updatekroneckerproducts!($Karr,$K,$Anew) # 3.066 ms (8 allocations: 320 bytes)

These allocations don’t really seem to affect performance, but I am wondering if I can get rid of them when performing the sum.

Any suggestions?

1 Like

When you write this, you’re allocating a three-element array of arrays just to assign it into local variables. Just delete the square brackets to use a tuple and avoid that. Similarly, when you’re returning all those arrays, you probably want to return a tuple instead of an actual array.

2 Likes

@StefanKarpinski That allocating call is not measured in the @btime.

I’m surprised this allocates, I wonder if its a “fake” allocation just caused by timing: I sometimes see small allocations like this that disappear in more complicated code, perhaps caused by inlining.

Edit: Sorry, just realised what’s going on: the allocation is in the copyto! broadcasting and so yes, using a tuple should fix it.

1 Like

Things like “8 allocations” are often just an artifact of the function needing to allocate to return objects, which goes away when the caller is not a global scope.

2 Likes

Using tuples improves the situation, thank you!

Now @btime shows 2.671 ms (3 allocations: 64 bytes) (by removing square brackets in the constructor and letting the update-function return nothing).

The allocations still stack up when performing the sum:

using LinearAlgebra: Diagonal, mul!

function kroneckermultiply!(Karray, Klazy, a, b, total, aDb, work, Amatrices, Dmatrices)
    total .= 0.0

    for (A,D) in zip(Amatrices,Dmatrices)
        updatekroneckerproducts!(Karray,Klazy,A) # 3 allocations: 64 bytes, each iteration
        a .= Karr[1] .- Karr[4]     # (I ⊗ A) - (A^T ⊗ I)
        b .= Karr[5]                # (I ⊗ A^H)
        mul!(work, a, D)
        mul!(aDb,work,b)
        total .+=  aDb .* rand()
    end

    return total
end


function bench(N=8,M=10)
    Amatrices = eval.(rand(ComplexF64,N,N) for i = 1:M)
    Dmatrices = Array.(Diagonal(rand(ComplexF64,N^2,N^2)) for i in 1:M)

    K = kroneckerproducts(A)
    Karr = Array.(K)

    a, b, total, work, aDb = (Matrix{ComplexF64}(undef,N^2,N^2) for i in 1:5)

    @btime kroneckermultiply!($Karr,$K,$a,$b,$total,$aDb,$work,$Amatrices,$Dmatrices);
end

bench() # 33.987 ms (30 allocations: 640 bytes)