I have a function which operates on tuples of arrays using broadcasting, but I’m rewriting it because it doesn’t play well with ForwardDiff.jl. However, I’ve noticed that version which loops through array entries is slower and produces more allocations.
I’ve included what I hope is a minimal enough working example. I take tuples of arrays
Y and sum them in
x = ntuple(x->randn(2,4),2) y = ntuple(x->randn(2,4),2) function bcast(x,y) fsum(x,y) = x + y out = fsum.(x,y) return out end function loop(x,y) out = ntuple(a->zeros(size(x)),length(x)) for i = 1:length(x) xi = (x->x[i]).(x) yi = (x->x[i]).(y) fsum!(x,y,out) = out[i] = x + y fsum!.(xi,yi,out) end return out end
Timing each one gives me
julia> @btime bcast($x,$y) 116.892 ns (3 allocations: 320 bytes) julia> @btime loop($x,$y) 2.159 μs (52 allocations: 1.83 KiB)
The extra allocations in the looped function are from
fsum!.(f,g,out), but I’m having trouble figuring out why. I tried @code_warntype, but I haven’t been able to interpret what’s going on.