Rolling Sum

What is the recommended way of doing a rolling / moving sum?

I’ve found packages RollingFunctions.jl and MarketTechnicals.jl, but neither has this.

1 Like

What’s wrong with the RollingFunctions approach?

ajulia> using RollingFunctions

julia> a = rand(10);

julia> rolling(sum, a, 5)
6-element Array{Float64,1}:
1 Like

Here’s a guess: since rolling accepts any function, it will do function-specific optimizations. In the case of sum, the best way to calculate a rolling sum is not to apply sum to successive windows. Instead, you’d want to, in each iteration, calculate the rolling sum as being the previous sum plus the new element entering the window, minus the last element leaving the window. The fact that this isn’t a function in RollingFunctions.jl makes me wonder whether I’m missing something


Not too hard to roll a simple one (for simple situations) which might be faster than the more generic approach.

function rolling_sum(a, n::Int)
    @assert 1<=n<=length(a)
    out = similar(a, length(a)-n+1)
    out[1] = sum(a[1:n])
    for i in eachindex(out)[2:end]
        out[i] = out[i-1]-a[i-1]+a[i+n-1]
    return out

This seems to work but no warranty implied, etc.

P.S. Just for fun, here’s a threaded version, which with four threads gives me about 2x performance.

function trolling_sum(a, n::Int)
    @assert 1<=n<=length(a)
    if nseg*n >= length(a)
        return rolling_sum(a,n)
        out = similar(a, length(a)-n+1)
        lseg = (length(out)-1)÷nseg+1
        segments = [(i*lseg+1, min(length(out),(i+1)*lseg)) for i in 0:nseg-1]
        Threads.@threads for (start, stop) in segments
            out[start] = sum(a[start:start+n-1])
            for i in start+1:stop
                out[i] = out[i-1]-a[i-1]+a[i+n-1]
        return out

I’m not completely sure about thread safety here. All threads write to out, but never to the same location in out.


@klaff exactly, I wrote something which is basically the same, and according to btime it’s about 10x faster than RollingFunctions.rolling(sum, …) (12ms, 5 allocations vs 135ms, 2M+ allocations on a 2million+ Vector{Float64})

I still think I’m missing something… why doesn’t RollingFunctions have this built-in? Or why isn’t there a Base function etc. I just assumed this existed and I couldn’t find it.

I am not familiar with this package, but if this is the case, you can add a specialized method on ::typeof(sum).


To me, it seems that the thing missing is a rolling function that takes both a function and it’s inverse. That would allow this approach to be done genetically for a wife variety of functions.


@Oscar_Smith Interesting. I don’t really know that approach. Care to explain how that would work exactly?

The idea is that for functions with inverse, you can apply the function where you used +, Ave the inverse where you used -.

1 Like

I assume you’re thinking along the lines of how the accumulating statistics functions on HP calculators worked, with a Σ+ key and a Σ- key, where those would add or remove x,y sample pairs, not by keeping a list but by appropriately updating the stats registers of Σx, Σy, etc.

1 Like

As suspected:

So… make a pull-request with your suggested optimization? That’s the beauty of open-source — you can lean on others work and collaborate to improve it.


ImageFiltering has generic operations for N-dimensional arrays. But if all you want is the sum, why not use cumsum?


There is a reason not to do the fast method, depending on your application.

julia> a=randn(1_000_000_000)
1000000000-element Array{Float64,1}:

julia> rolling_sum(a, 20)[end]

julia> sum(a[end-19:end])

You probably don’t care about the last 4 digits being different due to accumulated error, although you could reset the error periodically if it was an issue.


You could also try to make it faster using @view and parallelising. For example via:

using Distributed;
addprocs(2) # you choose

function rolling_sum(X::Array{Float64,1}, window::Int64)

    # Compute rolling sum
    Y = @sync @distributed (+) for i=1:window

    # Return rolling sum
    return Y;

I think you might see better results for large windows and more complicated functions to roll. I do not know the RollingFunctions package very well. If there is some space I might do a commit.

See the following conversation, there are also some code examples in comments. In particular, this comment:

I don’t think the rolling_sum function is amenable to parallel processing or multiple threads until the number of cores exceeds the length of the window, and then some. The out[i] = out[i-1]-a[i-1]+a[i+n-1] technique is inherently sequential, and so can’t really take advantage of SIMD. The other technique might be able to use SIMD but because it’s operations grow with length(data)*n rather than length(data), there’s a lot of inefficiency to be made up for.

EDIT: I take it back, but the problem would need to be split lengthwise, not at a deep loop level. So every 10,000 or so you’d reinitialize the sum and launch another sequentially computing task.

I wanted a rolling sum, not a cumulative sum.

Yes, but you can subtract: since

c_k = \sum_{i = 1}^k x_i

then a rolling sum of the previous n entries is just c_k - c_{k-n}.


I see. Ok, maybe I’m falling into the premature optimization rabbit hole.

1 Like