Block sparse matrix multiplication with repeated blocks

I’m trying to find a good way to compute V^TAV where A is a sparse CSC matrix A and V = I \otimes B is a block diagonal matrix with repeating dense blocks (size of B is around \mathbb{R}^{20 \times 10}).

Looping through blocks A_i and computing B^TA_iB seems about 2x slower than just storing V as a sparse matrix, but I got about 2x speedup using dense matrix operations by popping A into a dense array \tilde{A}

and BLAS-3 multiplying \tilde{A} by B^T, B using reshape and permutedims.

This seems a little awkward to me, and I’m wondering if there are nicer “Julian” solutions that I’ve overlooked.

You could look at BlockArrays.jl, BlockDiagonals.jl, BlockBandedMatrices.jl to see if they help.


Thanks @dlfivefifty - I actually started with BlockArrays, and the convenient notation and functionality helped me get things working faster. I switched back to SparseMatrixCSC because it was faster without the conversion from BlockArrays (even after adapting this trick).

For that size of matrix you are unlikely to beat dense BLAS I think

1 Like

Thanks! Appreciate the feedback.