N = 1024 ^ 2;
A = [ones(N), ones(N), ones(N), ones(N)];

The naive way of summing the arrays in A would be to do sum(A). However, note the performance:

julia> @btime sum(A);
7.968 ms (6 allocations: 24.00 MiB)

It unnecessarily allocates an array of size N for each addition (3x8 MB). Is there a way around this, or is the recommended approach to write a for loop, i.e.:

julia> @btime (S = copy(A[1]); for B in A[2:end]; S .+= B; end; S);
4.818 ms (13 allocations: 8.00 MiB)

Thanks! But the result I’m after here is an array which is the element wise sum of the 4 other arrays, and not the total sum of the individual elements.

If you don’t mind overwriting A then this would work:

A = [ones(2^10) for i=1:4];
sum(A)'
reduce((x,y) -> begin x .+= y; x end, A)'
A
using BenchmarkTools
@btime sum(A);
@btime reduce((x,y) -> begin x .+= y; x end, $A);

Edit: perhaps I should think more about foldl vs reduce here, but right now I think it’s OK.

N = 1024 ^ 2;
A = [ones(N), ones(N), ones(N), ones(N)];
a ⊕ b = a .+= b
@time r = reduce(⊕, A, init=zeros(size(A[1])))
0.009119 seconds (15 allocations: 8.001 MiB)

If the operator (⊕, \oplus) looks confusing, I found this interesting operator that might better correspond to what is done ⥆ \leftarrowplus