I would like to implement a more efficient broadcast for basic arithmetic operators for the BlockDiagonal
type of BlockDiagonals.jl. At the moment the broadcasting causes many allocations, I suppose because it falls back to base broadcasting.
julia> using BlockDiagonals
using BenchmarkTools
N1, N2, N3 = 3, 4, 5
N = N1 + N2 + N3
b1 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])
b2 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])
julia> @btime b3 = $(b1) .* $(b2);
39.218 μs (867 allocations: 90.19 KiB)
julia> @btime b3 = $(Matrix(b1)) .* $(Matrix(b2));
201.480 ns (1 allocation: 1.22 KiB)
The idea would be that when b1
and b2
have compatible block sizes, all these operations should be done block by block.
I have read the documentation on customizing overloading, but I have to say that I’m a bit lost. I have never dealt with custom types in Julia. Can someone point me to how this can be done in a neat way, or show a similar example?