Broadcasting for a block diagonal matrix

I would like to implement a more efficient broadcast for basic arithmetic operators for the BlockDiagonal type of BlockDiagonals.jl. At the moment the broadcasting causes many allocations, I suppose because it falls back to base broadcasting.

julia> using BlockDiagonals
using BenchmarkTools

N1, N2, N3 = 3, 4, 5
N = N1 + N2 + N3

b1 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])
b2 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])

julia> @btime b3 = $(b1) .* $(b2);
  39.218 μs (867 allocations: 90.19 KiB)

julia> @btime b3 = $(Matrix(b1)) .* $(Matrix(b2));
  201.480 ns (1 allocation: 1.22 KiB)

The idea would be that when b1 and b2 have compatible block sizes, all these operations should be done block by block.

I have read the documentation on customizing overloading, but I have to say that I’m a bit lost. I have never dealt with custom types in Julia. Can someone point me to how this can be done in a neat way, or show a similar example?


I have never used BlockDiagonals.jl but…

@btime b6 = Matrix($b1) .* Matrix($b2);

it seems to be much faster than your first option, though not as fast as the other one.

Well that is expected because you put the conversion to Matrix outside of the interpolations and thus it is benchmarked by @btime, hence making it slower than my second example. In any case, my question is more about implementing the broadcasting interface.

For this kind of missing feature, I recommend opening an issue, better to discuss there since it will lead to a PR likely,
(wether by you or the maintainers)
So keeping the discussion of how to do it in one place is useful.

1 Like

I did, but still, it would be nice for my personal knowledge to get a hint on how it is done.

1 Like

You can look at the implementation of broadcasting in BlockBandedMatrices.jl: