New array from a sum of N values in array A?

I’m sorry if the title is a little bit misleading, but I’m having trouble explaining it in one sentence

Assume I have an array A

A = [1, 2, 3, 4, 5];

What I want is to create an array B that will contain sums of every N elements, using a standard for loop this would look like:

B = [];
N = 3;
for i = 1:length(A)-N+1
    push!(B, sum(A[i:i+N-1]));
end

which produces

3-element Array{Any,1}:
  6
  9
 12

now, assume the array A contains tens of thousands of values… how to do this efficiently in Julia?

I’d do it by adding the next element to the previous sum and subtracting the one that moves out of the window:

function movingsum(a, width)
    b = Vector{eltype(a)}(undef, length(a)-width+1)
    s = sum(@view a[1:width])
    for i in 1:length(b)-1
        b[i] = s
        s += a[width+i] - a[i]
    end
    b[end] = s
    return b
end
3 Likes

rolling sum or moving sum?

See Rolling Sum

I think Tullio has an interesting solution

A = rand(100_000_000);

using Tullio

N = 3
res = zeros(eltype(A), length(A))

using Tullio
@time @tullio res[j] = A[j] + A[j+1] + A[j+2]
@time @tullio res[j] = A[j] + A[j+1] + A[j+2] # 0.2s
1 Like

You can also write this:

julia> A = 1:6; N = 3;
julia> using Tullio

julia> @tullio S[i] := A[i+k-1]  (k in 1:N)
4-element Array{Int64,1}:
  6
  9
 12
 15

julia> [sum(@view A[i:i+N-1]) for i in 1:length(A)-N+1]
4-element Array{Int64,1}:
  6
  9
 12
 15

As the linked “Rolling Sum” thread points out, both of these need to get each element of A about N times, while something like movingsum only gets each one once. Conversely I suppose the accumulation in s may run out of precision in some cases:

julia> A32 = Float32.(inv.(1:10^8));

julia> movingsum(A32, 100)[end]
4.390179f-6

julia> sum(A32[end-99:end])
1.0000006f-6
4 Likes

Kahan summation is left as an exercise to the reader :slight_smile:

Still, an N-pass algorithm may be faster for small N due to better memory access pattern. I kinda prematurely optimized for “wide” summation windows.

1 Like

Adding one comment to help: one important feature of the solution proposed by Vasily is that he created vector b before the loop instead of using push! This will make a big difference relative to the initial proposal.

Of course, that and not having to run over elements repeatedly.

1 Like

Note that repeated push! is pretty efficient (amortized linear time) in Julia. (However, in the initial post, B = [] creates a Vector{Any}, which has inefficient abstractly typed elements.)

2 Likes

There’s still quite a huge difference with and without pre-allocation

using BenchmarkTools

function prealloc_movingsum(a, width)
    b = Vector{eltype(a)}(undef, length(a)-width+1)
    s = sum(@view a[1:width])
    for i in 1:length(b)-1
        b[i] = s
        s += a[width+i] - a[i]
    end
    b[end] = s
    return b
end

function noprealloc_movingsum(a, width)
    b = eltype(a)[]
    s = sum(@view a[1:width])
    push!(b, s)
    for i in 1:length(a)-width
        s += a[width+i] - a[i]
        push!(b, s)
    end
    return b
end

A = rand(100);
@btime prealloc_movingsum($A, 5);
126.246 ns (1 allocation: 896 bytes)

@btime noprealloc_movingsum($A, 5);
742.203 ns (7 allocations: 2.20 KiB)
1 Like

Yes, I was just testing the same thing (not exactly in the same sum):

using Test
using BenchmarkTools

x = rand(1000)

function f(x)
  y = [ x[1] ]
  for i in 2:length(x)
    y = push!(y,x[i] + y[i-1])
  end
  return y
end

function g(x)
  y = Vector{Float64}(undef,length(x))
  y[1] = x[1]
  for i in 2:length(x)
    y[i] = x[i] + y[i-1]
  end
  return y
end

@test sum(f(x)) ≈ sum(g(x))

println("push:")
@btime f($x)

println("vector:")
@btime g($x)

println("end")
push:
  5.666 μs (10 allocations: 16.41 KiB)
vector:
  1.375 μs (1 allocation: 7.94 KiB)
end

edit: Just add further information, as pointed by @stevengj, the fact that the vector is of type Any is also extremely detrimental to performance (also for a factor of ~5):

function h(x)
  y = Vector{Any}(undef,0)
  push!(y,x[1])
  for i in 2:length(x)
    y = push!(y,x[i] + y[i-1])
  end
  return y
end
julia> @btime h($x)
Any:
  24.734 μs (2009 allocations: 47.63 KiB)


2 Likes