Updating lazy kronecker products

eliassno · April 2, 2019, 6:20pm

I need to calculate something of the form: $\sum_i \left( (I \otimes A_i) - (A_i^T \otimes I) \right) D_i (I \otimes A^H) f_i$

where $\small A_i$ are dense matrices, $\small D_i$ are diagonal, $\small I$ is the identity matrix and

is an arbitrary scalar function.

The following code creates all the variations of Kronecker products that I need (transpose, Hermitian transpose and complex conjugation). It also allows me to update the products when I update $\small A_i$ .

using LazyArrays, BenchmarkTools


function kroneckerproducts(A)
    I = oneunit(A)

    # (I ⊗ A) and its transpose, Hermitian transpose and complex conjugate.
    IA = Kron(I,A)
    IAT, IAH, IAC = [transpose(IA), IA', transpose(IA')]

    # (A ⊗ I)...
    AI = Kron(A,I)
    ATI, AHI, ACI = [transpose(AI), AI', transpose(AI')]

    return [IA, AI, IAT, ATI, IAH, AHI, IAC, ACI]
end

function updatekroneckerproducts!(Karray,Klazy,B)
    # Should be enough to update (I ⊗ A) -> (I ⊗ B)
    Klazy[1].arrays[2] .= B
    copyto!.(Karray,Klazy)
end

N = 8
A = rand(ComplexF64,N,N)
K = kroneckerproducts(A)
Karr = Array.(K)

Anew = rand(ComplexF64,N,N)

@btime updatekroneckerproducts!($Karr,$K,$Anew) # 3.066 ms (8 allocations: 320 bytes)

These allocations don’t really seem to affect performance, but I am wondering if I can get rid of them when performing the sum.

Any suggestions?

StefanKarpinski · April 2, 2019, 8:15pm

When you write this, you’re allocating a three-element array of arrays just to assign it into local variables. Just delete the square brackets to use a tuple and avoid that. Similarly, when you’re returning all those arrays, you probably want to return a tuple instead of an actual array.

dlfivefifty · April 2, 2019, 8:42pm

@StefanKarpinski That allocating call is not measured in the @btime.

I’m surprised this allocates, I wonder if its a “fake” allocation just caused by timing: I sometimes see small allocations like this that disappear in more complicated code, perhaps caused by inlining.

Edit: Sorry, just realised what’s going on: the allocation is in the copyto! broadcasting and so yes, using a tuple should fix it.

StefanKarpinski · April 2, 2019, 8:44pm

Things like “8 allocations” are often just an artifact of the function needing to allocate to return objects, which goes away when the caller is not a global scope.

eliassno · April 3, 2019, 1:22am

Using tuples improves the situation, thank you!

Now @btime shows 2.671 ms (3 allocations: 64 bytes) (by removing square brackets in the constructor and letting the update-function return nothing).

The allocations still stack up when performing the sum:

using LinearAlgebra: Diagonal, mul!

function kroneckermultiply!(Karray, Klazy, a, b, total, aDb, work, Amatrices, Dmatrices)
    total .= 0.0

    for (A,D) in zip(Amatrices,Dmatrices)
        updatekroneckerproducts!(Karray,Klazy,A) # 3 allocations: 64 bytes, each iteration
        a .= Karr[1] .- Karr[4]     # (I ⊗ A) - (A^T ⊗ I)
        b .= Karr[5]                # (I ⊗ A^H)
        mul!(work, a, D)
        mul!(aDb,work,b)
        total .+=  aDb .* rand()
    end

    return total
end


function bench(N=8,M=10)
    Amatrices = eval.(rand(ComplexF64,N,N) for i = 1:M)
    Dmatrices = Array.(Diagonal(rand(ComplexF64,N^2,N^2)) for i in 1:M)

    K = kroneckerproducts(A)
    Karr = Array.(K)

    a, b, total, work, aDb = (Matrix{ComplexF64}(undef,N^2,N^2) for i in 1:5)

    @btime kroneckermultiply!($Karr,$K,$a,$b,$total,$aDb,$work,$Amatrices,$Dmatrices);
end

bench() # 33.987 ms (30 allocations: 640 bytes)

Topic		Replies	Views
Fast computation of row-wise Kronecker product (Khatri-Rao product) Performance question , performance	5	1249	April 8, 2021
How to speed up this Kronecker Multiplication? Performance question	7	365	March 25, 2024
Efficient (possibly preallocated) chain of Kronecker products Numerics question	2	422	September 30, 2019
Reusing Sparsity patterns in sparse matrix products Performance	11	1274	December 11, 2018
Odd behaviour with kron, transpose and sparse matrices General Usage	2	890	December 2, 2018

Updating lazy kronecker products

Related topics