I’d do it by adding the next element to the previous sum and subtracting the one that moves out of the window:

function movingsum(a, width)
b = Vector{eltype(a)}(undef, length(a)-width+1)
s = sum(@view a[1:width])
for i in 1:length(b)-1
b[i] = s
s += a[width+i] - a[i]
end
b[end] = s
return b
end

julia> A = 1:6; N = 3;
julia> using Tullio
julia> @tullio S[i] := A[i+k-1] (k in 1:N)
4-element Array{Int64,1}:
6
9
12
15
julia> [sum(@view A[i:i+N-1]) for i in 1:length(A)-N+1]
4-element Array{Int64,1}:
6
9
12
15

As the linked “Rolling Sum” thread points out, both of these need to get each element of A about N times, while something like movingsum only gets each one once. Conversely I suppose the accumulation in s may run out of precision in some cases:

Adding one comment to help: one important feature of the solution proposed by Vasily is that he created vector b before the loop instead of using push! This will make a big difference relative to the initial proposal.

Of course, that and not having to run over elements repeatedly.

Note that repeated push! is pretty efficient (amortized linear time) in Julia. (However, in the initial post, B = [] creates a Vector{Any}, which has inefficient abstractly typed elements.)

There’s still quite a huge difference with and without pre-allocation

using BenchmarkTools
function prealloc_movingsum(a, width)
b = Vector{eltype(a)}(undef, length(a)-width+1)
s = sum(@view a[1:width])
for i in 1:length(b)-1
b[i] = s
s += a[width+i] - a[i]
end
b[end] = s
return b
end
function noprealloc_movingsum(a, width)
b = eltype(a)[]
s = sum(@view a[1:width])
push!(b, s)
for i in 1:length(a)-width
s += a[width+i] - a[i]
push!(b, s)
end
return b
end
A = rand(100);
@btime prealloc_movingsum($A, 5);
126.246 ns (1 allocation: 896 bytes)
@btime noprealloc_movingsum($A, 5);
742.203 ns (7 allocations: 2.20 KiB)

Yes, I was just testing the same thing (not exactly in the same sum):

using Test
using BenchmarkTools
x = rand(1000)
function f(x)
y = [ x[1] ]
for i in 2:length(x)
y = push!(y,x[i] + y[i-1])
end
return y
end
function g(x)
y = Vector{Float64}(undef,length(x))
y[1] = x[1]
for i in 2:length(x)
y[i] = x[i] + y[i-1]
end
return y
end
@test sum(f(x)) ≈ sum(g(x))
println("push:")
@btime f($x)
println("vector:")
@btime g($x)
println("end")

edit: Just add further information, as pointed by @stevengj, the fact that the vector is of type Any is also extremely detrimental to performance (also for a factor of ~5):

function h(x)
y = Vector{Any}(undef,0)
push!(y,x[1])
for i in 2:length(x)
y = push!(y,x[i] + y[i-1])
end
return y
end
julia> @btime h($x)
Any:
24.734 μs (2009 allocations: 47.63 KiB)