Rolling Sum

ggjp · November 6, 2019, 2:01pm

What is the recommended way of doing a rolling / moving sum?

I’ve found packages RollingFunctions.jl and MarketTechnicals.jl, but neither has this.

nilshg · November 6, 2019, 2:10pm

What’s wrong with the RollingFunctions approach?

ajulia> using RollingFunctions

julia> a = rand(10);

julia> rolling(sum, a, 5)
6-element Array{Float64,1}:
 3.2334084082359906
 3.8027887140089467
 3.785573015992105
 3.991553119385805
 3.547388812871529
 3.221195695490038

ggjp · November 6, 2019, 2:20pm

Here’s a guess: since rolling accepts any function, it will do function-specific optimizations. In the case of sum, the best way to calculate a rolling sum is not to apply sum to successive windows. Instead, you’d want to, in each iteration, calculate the rolling sum as being the previous sum plus the new element entering the window, minus the last element leaving the window. The fact that this isn’t a function in RollingFunctions.jl makes me wonder whether I’m missing something

klaff · November 6, 2019, 2:32pm

Not too hard to roll a simple one (for simple situations) which might be faster than the more generic approach.

function rolling_sum(a, n::Int)
    @assert 1<=n<=length(a)
    out = similar(a, length(a)-n+1)
    out[1] = sum(a[1:n])
    for i in eachindex(out)[2:end]
        out[i] = out[i-1]-a[i-1]+a[i+n-1]
    end
    return out
end

This seems to work but no warranty implied, etc.

P.S. Just for fun, here’s a threaded version, which with four threads gives me about 2x performance.

function trolling_sum(a, n::Int)
    @assert 1<=n<=length(a)
    nseg=4
    if nseg*n >= length(a)
        return rolling_sum(a,n)
    else
        out = similar(a, length(a)-n+1)
        lseg = (length(out)-1)÷nseg+1
        segments = [(i*lseg+1, min(length(out),(i+1)*lseg)) for i in 0:nseg-1]
        Threads.@threads for (start, stop) in segments
            out[start] = sum(a[start:start+n-1])
            for i in start+1:stop
                out[i] = out[i-1]-a[i-1]+a[i+n-1]
            end
        end
        return out
    end
end

I’m not completely sure about thread safety here. All threads write to out, but never to the same location in out.

ggjp · November 6, 2019, 2:35pm

@klaff exactly, I wrote something which is basically the same, and according to btime it’s about 10x faster than RollingFunctions.rolling(sum, …) (12ms, 5 allocations vs 135ms, 2M+ allocations on a 2million+ Vector{Float64})

I still think I’m missing something… why doesn’t RollingFunctions have this built-in? Or why isn’t there a Base function etc. I just assumed this existed and I couldn’t find it.

Tamas_Papp · November 6, 2019, 2:50pm

I am not familiar with this package, but if this is the case, you can add a specialized method on ::typeof(sum).

Oscar_Smith · November 6, 2019, 3:07pm

To me, it seems that the thing missing is a rolling function that takes both a function and it’s inverse. That would allow this approach to be done genetically for a wife variety of functions.

ggjp · November 6, 2019, 3:10pm

@Oscar_Smith Interesting. I don’t really know that approach. Care to explain how that would work exactly?

Oscar_Smith · November 6, 2019, 3:14pm

The idea is that for functions with inverse, you can apply the function where you used +, Ave the inverse where you used -.

klaff · November 6, 2019, 3:28pm

I assume you’re thinking along the lines of how the accumulating statistics functions on HP calculators worked, with a Σ+ key and a Σ- key, where those would add or remove x,y sample pairs, not by keeping a list but by appropriately updating the stats registers of Σx, Σy, etc.

ggjp · November 6, 2019, 4:09pm

As suspected: https://github.com/JeffreySarnoff/RollingFunctions.jl/blob/f42c3e28668d63191daf93e4b62f81d9f98c1b40/src/roll/rolling.jl#L6-L16

mbauman · November 6, 2019, 4:18pm

So… make a pull-request with your suggested optimization? That’s the beauty of open-source — you can lean on others work and collaborate to improve it.

tim.holy · November 6, 2019, 4:28pm

ImageFiltering has generic operations for N-dimensional arrays. But if all you want is the sum, why not use cumsum?

klaff · November 6, 2019, 5:52pm

There is a reason not to do the fast method, depending on your application.

julia> a=randn(1_000_000_000)
1000000000-element Array{Float64,1}:
 -1.2344126204756887
 -0.8239991893593851
  1.178705934119596
 -0.013067916777890981
 -2.0283561056107553
  0.2622616832126358
 -0.6759869411454665
 -0.5361857977094139
  0.8097520205869354
  0.32388073554507785
 -2.527704302672418
  ⋮
 -0.6576524013369835
 -0.23953427627419813
 -0.028670388394535708
 -1.5805848784975132
  0.48150455056217334
  1.5563777415762128
 -2.8107725516409716
 -1.0499675458464977
 -0.27204237866643
 -1.3917792177726476

julia> rolling_sum(a, 20)[end]
-5.901424969621745

julia> sum(a[end-19:end])
-5.901424969620317

You probably don’t care about the last 4 digits being different due to accumulated error, although you could reset the error periodically if it was an issue.

fipelle · November 6, 2019, 6:35pm

You could also try to make it faster using @view and parallelising. For example via:

using Distributed;
addprocs(2) # you choose

function rolling_sum(X::Array{Float64,1}, window::Int64)

    # Compute rolling sum
    Y = @sync @distributed (+) for i=1:window
        @view(X[i:end-(window-i)])
    end

    # Return rolling sum
    return Y;
end

I think you might see better results for large windows and more complicated functions to roll. I do not know the RollingFunctions package very well. If there is some space I might do a commit.

sairus7 · November 6, 2019, 6:37pm

See the following conversation, there are also some code examples in comments. In particular, this comment:

klaff · November 6, 2019, 7:28pm

I don’t think the rolling_sum function is amenable to parallel processing or multiple threads until the number of cores exceeds the length of the window, and then some. The out[i] = out[i-1]-a[i-1]+a[i+n-1] technique is inherently sequential, and so can’t really take advantage of SIMD. The other technique might be able to use SIMD but because it’s operations grow with length(data)*n rather than length(data), there’s a lot of inefficiency to be made up for.

EDIT: I take it back, but the problem would need to be split lengthwise, not at a deep loop level. So every 10,000 or so you’d reinitialize the sum and launch another sequentially computing task.

ggjp · November 7, 2019, 10:06am

I wanted a rolling sum, not a cumulative sum.

tim.holy · November 7, 2019, 10:09am

Yes, but you can subtract: since

c_k = \sum_{i = 1}^k x_i

then a rolling sum of the previous n entries is just c_k - c_{k-n}.

ggjp · November 7, 2019, 12:44pm

I see. Ok, maybe I’m falling into the premature optimization rabbit hole.

Topic		Replies	Views
Rollsum/cumulative optimisation Performance question	12	311	June 5, 2023
A weird operation to optimize General Usage	13	102	May 3, 2025
A fast sum. Any downsides? Performance sum	18	958	December 16, 2024
How to improve performance of simple averaging function Performance	4	899	April 19, 2019
How to compute a "cumulative" in a dataframe (without a for loop) Data question , dataframes	44	9515	September 11, 2021

Rolling Sum

Related topics