Summing arrays efficiently

bennedich · September 22, 2018, 12:43pm

N = 1024 ^ 2;
A = [ones(N), ones(N), ones(N), ones(N)];

The naive way of summing the arrays in A would be to do sum(A). However, note the performance:

julia> @btime sum(A);
  7.968 ms (6 allocations: 24.00 MiB)

It unnecessarily allocates an array of size N for each addition (3x8 MB). Is there a way around this, or is the recommended approach to write a for loop, i.e.:

julia> @btime (S = copy(A[1]); for B in A[2:end]; S .+= B; end; S);
  4.818 ms (13 allocations: 8.00 MiB)

baggepinnen · September 22, 2018, 1:10pm

Try sum(sum.(A))

julia> @btime sum(sum.($A))
  1.876 ms (1 allocation: 112 bytes)
4.194304e6

baggepinnen · September 22, 2018, 1:13pm

Or, to go completely allocation free

julia> @btime sum(a->sum(a),$A)
  1.873 ms (0 allocations: 0 bytes)
4.194304e6

bennedich · September 22, 2018, 1:26pm

Thanks! But the result I’m after here is an array which is the element wise sum of the 4 other arrays, and not the total sum of the individual elements.

improbable22 · September 22, 2018, 1:42pm

If you don’t mind overwriting A then this would work:

A = [ones(2^10) for i=1:4];
sum(A)'
reduce((x,y) -> begin x .+= y; x end, A)'
A

using BenchmarkTools
@btime sum(A);
@btime reduce((x,y) -> begin x .+= y; x end, $A);

Edit: perhaps I should think more about foldl vs reduce here, but right now I think it’s OK.

baggepinnen · September 22, 2018, 1:45pm

Or similarily, but without overwriting A

N = 1024 ^ 2;
A = [ones(N), ones(N), ones(N), ones(N)];
@time r = zeros(size(A[1]));reduce((x,y)->(x .+= y),A, init=r);
 0.002367 seconds (7 allocations: 8.000 MiB)

baggepinnen · September 22, 2018, 1:49pm

This version looks a bit more pleasing

N = 1024 ^ 2;
A = [ones(N), ones(N), ones(N), ones(N)];
a ⊕ b = a .+= b
@time r = reduce(⊕, A, init=zeros(size(A[1])))
 0.009119 seconds (15 allocations: 8.001 MiB)

If the operator (⊕, \oplus) looks confusing, I found this interesting operator that might better correspond to what is done ⥆ \leftarrowplus

bennedich · September 22, 2018, 2:16pm

Thanks! That should work. I modified it to make it slightly more efficient:

julia> @btime reduce(⊕, view(A,2:length(A)), init=copy(A[1]));
  3.578 ms (8 allocations: 8.00 MiB)

DNF · September 22, 2018, 2:20pm

Shouldn’t that be sum(sum, A)?

baggepinnen · September 22, 2018, 2:20pm

Yes you are right!

Topic		Replies	Views
Puzzled by the difference in performance when summing an array in different ways Performance array , memory-allocation	4	867	July 19, 2019
Generator expressions New to Julia memory-allocation , generator	4	448	February 20, 2024
Assistance in speeding up Array operations and summations New to Julia performance , arrays , sum	12	534	August 23, 2023
How to match performance of sum(A, dims=1)? Performance	14	1069	September 21, 2022
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4735	July 19, 2021

Summing arrays efficiently

Related topics