If you have a parallel mapreduce(f, op, x)
(e.g., reduce(op, Map(f), x)
from Transducers.jl), a neat way to minimize allocation and call mul!
would be to use LazyArrays. Something like this (untested):
using LazyArrays: @~
using Transducers: Map
z = reduce(Map(x -> @~ x'x), x; init=nothing) do a, b
a === nothing ? copy(b) : a .+= b
end
See also: Parallel reductions