There is also this workaround where I made a non-allocating broadcast_reduce, but the tullio solution looks very elegant.
broadcast_reduce