What is the recommended way of doing a rolling / moving sum?

I’ve found packages RollingFunctions.jl and MarketTechnicals.jl, but neither has this.

What is the recommended way of doing a rolling / moving sum?

I’ve found packages RollingFunctions.jl and MarketTechnicals.jl, but neither has this.

1 Like

What’s wrong with the RollingFunctions approach?

```
ajulia> using RollingFunctions
julia> a = rand(10);
julia> rolling(sum, a, 5)
6-element Array{Float64,1}:
3.2334084082359906
3.8027887140089467
3.785573015992105
3.991553119385805
3.547388812871529
3.221195695490038
```

Here’s a guess: since `rolling`

accepts any function, it will do function-specific optimizations. In the case of sum, the best way to calculate a rolling sum is not to apply sum to successive windows. Instead, you’d want to, in each iteration, calculate the rolling sum as being the previous sum plus the new element entering the window, minus the last element leaving the window. The fact that this isn’t a function in RollingFunctions.jl makes me wonder whether I’m missing something

1 Like

Not too hard to roll a simple one (for simple situations) which might be faster than the more generic approach.

```
function rolling_sum(a, n::Int)
@assert 1<=n<=length(a)
out = similar(a, length(a)-n+1)
out[1] = sum(a[1:n])
for i in eachindex(out)[2:end]
out[i] = out[i-1]-a[i-1]+a[i+n-1]
end
return out
end
```

This seems to work but no warranty implied, etc.

P.S. Just for fun, here’s a threaded version, which with four threads gives me about 2x performance.

```
function trolling_sum(a, n::Int)
@assert 1<=n<=length(a)
nseg=4
if nseg*n >= length(a)
return rolling_sum(a,n)
else
out = similar(a, length(a)-n+1)
lseg = (length(out)-1)÷nseg+1
segments = [(i*lseg+1, min(length(out),(i+1)*lseg)) for i in 0:nseg-1]
Threads.@threads for (start, stop) in segments
out[start] = sum(a[start:start+n-1])
for i in start+1:stop
out[i] = out[i-1]-a[i-1]+a[i+n-1]
end
end
return out
end
end
```

I’m not completely sure about thread safety here. All threads write to `out`

, but never to the same location in `out`

.

3 Likes

@klaff exactly, I wrote something which is basically the same, and according to btime it’s about 10x faster than RollingFunctions.rolling(sum, …) (12ms, 5 allocations vs 135ms, 2M+ allocations on a 2million+ Vector{Float64})

I still think I’m missing something… why doesn’t RollingFunctions have this built-in? Or why isn’t there a Base function etc. I just assumed this existed and I couldn’t find it.

I am not familiar with this package, but if this is the case, you can add a specialized method on `::typeof(sum)`

.

3 Likes

To me, it seems that the thing missing is a rolling function that takes both a function and it’s inverse. That would allow this approach to be done genetically for a wife variety of functions.

3 Likes

@Oscar_Smith Interesting. I don’t really know that approach. Care to explain how that would work exactly?

The idea is that for functions with inverse, you can apply the function where you used +, Ave the inverse where you used -.

1 Like

I assume you’re thinking along the lines of how the accumulating statistics functions on HP calculators worked, with a `Σ+`

key and a `Σ-`

key, where those would add or remove `x,y`

sample pairs, not by keeping a list but by appropriately updating the stats registers of `Σx`

, `Σy`

, etc.

1 Like

As suspected: https://github.com/JeffreySarnoff/RollingFunctions.jl/blob/f42c3e28668d63191daf93e4b62f81d9f98c1b40/src/roll/rolling.jl#L6-L16

So… make a pull-request with your suggested optimization? That’s the beauty of open-source — you can lean on others work and collaborate to improve it.

7 Likes

ImageFiltering has generic operations for N-dimensional arrays. But if all you want is the sum, why not use `cumsum`

?

3 Likes

There is a reason not to do the fast method, depending on your application.

```
julia> a=randn(1_000_000_000)
1000000000-element Array{Float64,1}:
-1.2344126204756887
-0.8239991893593851
1.178705934119596
-0.013067916777890981
-2.0283561056107553
0.2622616832126358
-0.6759869411454665
-0.5361857977094139
0.8097520205869354
0.32388073554507785
-2.527704302672418
⋮
-0.6576524013369835
-0.23953427627419813
-0.028670388394535708
-1.5805848784975132
0.48150455056217334
1.5563777415762128
-2.8107725516409716
-1.0499675458464977
-0.27204237866643
-1.3917792177726476
julia> rolling_sum(a, 20)[end]
-5.901424969621745
julia> sum(a[end-19:end])
-5.901424969620317
```

You *probably* don’t care about the last 4 digits being different due to accumulated error, although you could reset the error periodically if it was an issue.

3 Likes

You could also try to make it faster using `@view`

and parallelising. For example via:

```
using Distributed;
addprocs(2) # you choose
function rolling_sum(X::Array{Float64,1}, window::Int64)
# Compute rolling sum
Y = @sync @distributed (+) for i=1:window
@view(X[i:end-(window-i)])
end
# Return rolling sum
return Y;
end
```

I think you might see better results for large windows and more complicated functions to roll. I do not know the `RollingFunctions`

package very well. If there is some space I might do a commit.

See the following conversation, there are also some code examples in comments. In particular, this comment:

I don’t think the rolling_sum function is amenable to parallel processing or multiple threads until the number of cores exceeds the length of the window, and then some. The `out[i] = out[i-1]-a[i-1]+a[i+n-1]`

technique is inherently sequential, and so can’t really take advantage of SIMD. The other technique might be able to use SIMD but because it’s operations grow with length(data)*n rather than length(data), there’s a lot of inefficiency to be made up for.

EDIT: I take it back, but the problem would need to be split lengthwise, not at a deep loop level. So every 10,000 or so you’d reinitialize the sum and launch another sequentially computing task.

I wanted a rolling sum, not a cumulative sum.

Yes, but you can subtract: since

c_k = \sum_{i = 1}^k x_i

then a rolling sum of the previous n entries is just c_k - c_{k-n}.

4 Likes

I see. Ok, maybe I’m falling into the premature optimization rabbit hole.

1 Like