# Rolling maxima

Hi all fine people here!

I am just interested in knowing if anyone here sees any performance improvements in my code. Either algorithmically or Julia specific.

The code in question is a function which calculates local maxima and minima.
The function does this by calculating the rolling maximum (the maximum in a moving window) and uses that to obtain local optima.

Below is the code! vect is usually on the order of 1000 entries.

``````using DataStructures

function rolling_optima(vect::Any, win_size::Integer; debug=false)

roll_max = zeros(Float64, length(vect))
roll_min = zeros(Float64, length(vect))
opt_index = zeros(Int16, length(vect))
opt_type = zeros(Int8, length(vect))
d_max = Deque{Float64}()
d_min = Deque{Float64}()
win_forward = win_size
win_backward = win_forward*2
opt_ind = 0
#comp = 10
@inbounds for i in eachindex(vect)
@inbounds for k in 1:length(d_max)
if vect[i] >= front(d_max) popfirst!(d_max)
else break end
end
@inbounds for k in 1:length(d_min)
if vect[i] <= front(d_min) popfirst!(d_min)
else break end
end
pushfirst!(d_max,vect[i])
pushfirst!(d_min,vect[i])
j = i + 1 - win_forward
if j >= 1
roll_max[i] = back(d_max)
roll_min[i] = back(d_min)
if vect[j] == back(d_max)
if roll_max[i] == roll_max[j]
if j-win_backward > 0
if roll_max[j] > maximum(vect[j-win_backward:j-1])
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = 1
end
end
end
pop!(d_max)
end
if vect[j] == back(d_min)
if roll_min[i] == roll_min[j]
if j-win_backward > 0
if roll_min[j] < minimum(vect[j-win_backward:j-1])
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = -1
end
end
end
pop!(d_min)
end
end
end
opt_index = opt_index[1:opt_ind]
opt_type = opt_type[1:opt_ind]
if debug
return opt_index, opt_type, roll_max, roll_min
else
return opt_index, opt_type
end
end
``````

What definition are you using? The usual definition of a local maximum would be a value that is â‰Ą both its neighbors, which you could implement in a trivial loop that checks `vect[i-1] â‰¤ vect[i] â‰Ą vect[i+1]`.

To be defined as maxima there should be no greater entries win_forward places after the spot and win_backward places before that spot. So it has to check a larger area around each point.

Have you looked at the existing implementations in the ecosystem such as the one in RollingFunctions.jl to see if they meet your needs?

1 Like

Unless your windows are very small, there are more efficient approaches than checking each neighbor. https://github.com/JuliaImages/ImageFiltering.jl uses an algorithm here: http://arxiv.org/abs/cs.DS/0610046.

While it seems you need locations as well as values, I would bet you could use the ImageFiltering code directly by creating a new AbstractArray wrapper where indexing returns an `index=>val` pair and comparison looked only at the `val`. (Youâ€™d want to define a custom Pair type so that you werenâ€™t pirating Baseâ€™s Pair.)

3 Likes

Thanks! I will look into that. My current algorithm does not do the naive thing of checking all neighbours every step though. So I donâ€™t think there is a huge win in performance in improving the algorithm (I can be wrong) but I know that memory handling is often at least as important as the algorithm in practice.

Having looked at the pseudo code from the article I could improve the runtime of my code by about 10-15%. I already used most of the concepts from that article, why the improvement wasnâ€™t larger.

The main improvement came from realising that the algorithm either finds a larger or a smaller element than the previous one in the same step, and never none of them and never both, so there could be an if - else instead of two checks. I.e:

``````        if vect[i] > vect[i-1] # ...
``````

If anyone is interested, I post the improved code below. I am still interested in more Julia specific improvements which I suspect would further improve the performance.

Improved code:

``````function rolling_optima(vect::Any, win_size::Integer; debug=false)

roll_max = zeros(Float64, length(vect))
roll_min = zeros(Float64, length(vect))
opt_index = zeros(Int16, length(vect))
opt_type = zeros(Int8, length(vect))
d_max = Deque{Float64}()
d_min = Deque{Float64}()
win_forward = win_size
win_backward = win_forward*2
opt_ind = 0
#comp = 10
pushfirst!(d_max,vect[1])
pushfirst!(d_min,vect[1])
@inbounds for i in 2:length(vect)
if vect[i] > vect[i-1]
@inbounds for k in 1:length(d_max)
if vect[i] >= front(d_max) popfirst!(d_max)
else break end
end
else
@inbounds for k in 1:length(d_min)
if vect[i] <= front(d_min) popfirst!(d_min)
else break end
end
end
pushfirst!(d_max,vect[i])
pushfirst!(d_min,vect[i])
j = i + 1 - win_forward
if j >= 1
roll_max[i] = back(d_max)
roll_min[i] = back(d_min)
if vect[j] == back(d_max)
if roll_max[i] == roll_max[j]
if j-win_backward > 0
if roll_max[j] > maximum(vect[j-win_backward:j-1])
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = 1
end
end
end
pop!(d_max)
elseif vect[j] == back(d_min)
if roll_min[i] == roll_min[j]
if j-win_backward > 0
if roll_min[j] < minimum(vect[j-win_backward:j-1])
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = -1
end
end
end
pop!(d_min)
end
end
end
opt_index = opt_index[1:opt_ind]
opt_type = opt_type[1:opt_ind]
if debug
return opt_index, opt_type, roll_max, roll_min
else
return opt_index, opt_type
end
end
``````
1 Like

Iâ€™m curious to know happens if you replace `Deque{Float64}` with `Vector{Float64}`? (The native `Julia` array type also supports O(1) amortized push/pop from both ends, and Iâ€™m curious to know how it compares.)

Note that `maximum(vect[j-win_backward:j-1])` allocates a copy of the slice; you might want to use `@views` or just replace it with a loop. (And it seems like you could do even better, e.g. with a heap-like data structure, so that you donâ€™t have to loop over the entire window for every `j`.) Similarly for the `minimum` call.

2 Likes

Using `Vector{Float64}` seems to increase the runtime by almost 40% compared to using the `Deque{Float64}`

Yes! This is the part of the code that is the newest functional addition and yes itâ€™s probably far from optimal, however I donâ€™t think the code usually spends a lot of time here anyway so the benefits of optimizing this part are probably small.

Using views here gave about 1.5% of speedup. Nothing to write home about but certainly welcome from such a simple change.

As other have already said the key question is " you donâ€™t need to loop over the entire window for every `j`"

I changed that part of the code to

``````             if j-win_backward > 0
failed = false
for v in @view vect[j-win_backward:j-1]
if roll_max[j] < v
failed = true
break end
end
if !failed
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = 1
end
``````

it added a slight improvement in speed (again a few %). Is that what you were thinking about or were you thinking about something more sophisticated?

I think your algorithm is O(N), so there is nothing more to improve here. I checked that your code allocates a lot, despite of using views. Maybe I have no your latest code, so please veryfy.

``````julia> x = rand(10000);

julia> K = 90;

julia> @time result1 = rolling_optima(x, K, debug=true)[3];
3.253689 seconds (4.66 M allocations: 724.375 MiB, 2.50% gc time)

julia> @time result2 = movingmax(x, K);
0.000350 seconds (10.01 k allocations: 546.469 KiB)

julia> result1[K:end] == result2
true
``````

And my implementation of window max:

``````using LinkedLists

function movingmax(array::AbstractArray{T}, K::Integer) where T <: Real
@assert K > 0
@assert K < length(array)

result = zeros(T, length(array)-K+1)
windowindex = 0
resultindex = 1
for item in array

if !isempty(list) && first(list)[1] == windowindex
popfirst!(list)
end

while !isempty(list) && last(list)[2] < item
pop!(list)
end

push!(list, (windowindex, item))
windowindex = (windowindex + 1) % K

if windowindex == 0
end

result[resultindex] = first(list)[2]
resultindex = resultindex + 1
end
end

return result
end
``````

My latest code below: (note that I have changed the name of a few functions since they were changed in the updated DataStructures.jl package which I now use.

``````function rolling_optima(vect::Any, win_size::Integer; debug=false)

roll_max = zeros(Float64, length(vect))
roll_min = zeros(Float64, length(vect))
opt_index = zeros(Int16, length(vect))
opt_type = zeros(Int8, length(vect))
d_max = Deque{Float64}()
d_min = Deque{Float64}()
win_forward = win_size
win_backward = win_forward*2
opt_ind = 0
#comp = 10
pushfirst!(d_max,vect[1])
pushfirst!(d_min,vect[1])
@inbounds for i in 2:length(vect)

if vect[i] > vect[i-1]
@inbounds for k in 1:length(d_max)
if vect[i] >= first(d_max) popfirst!(d_max)
else break end
end
else
@inbounds for k in 1:length(d_min)
if vect[i] <= first(d_min) popfirst!(d_min)
else break end
end
end
pushfirst!(d_max,vect[i])
pushfirst!(d_min,vect[i])
j = i + 1 - win_forward
if j >= 1
roll_max[i] = last(d_max)
roll_min[i] = last(d_min)
if vect[j] == last(d_max)
if roll_max[i] == roll_max[j]
if j-win_backward > 0
failed = false
for v in @view vect[j-win_backward:j-1]
if roll_max[j] < v
failed = true
break end
end
if !failed
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = 1
end
end
end
pop!(d_max)
elseif vect[j] == last(d_min)
if roll_min[i] == roll_min[j]
if j-win_backward > 0
failed = false
for v in @view vect[j-win_backward:j-1]
if roll_min[j] > v
failed = true
break end
end
if !failed
opt_ind +=1
opt_index[opt_ind] = j
opt_type[opt_ind] = -1
end
end
end
pop!(d_min)
end
end
end
opt_index = opt_index[1:opt_ind]
opt_type = opt_type[1:opt_ind]
if debug
return opt_index, opt_type, roll_max, roll_min
else
return opt_index, opt_type
end
end
``````

Now to your results. It seems that something was strange with running my function. Here are my current results.

``````julia> x = rand(10000)
@time rr = rolling_optima(x, 90, debug=true);
0.000927 seconds (26 allocations: 203.266 KiB)

julia> @time rr = rolling_optima(x, 90, debug=false);
0.000916 seconds (20 allocations: 203.031 KiB)

julia> @btime rr = rolling_optima(\$x, 90, debug=false);
501.711 ÎĽs (16 allocations: 202.88 KiB)

``````

While not as good as the results for your function movingmax not as bad as what you had. (I of course did â€śwarm upâ€ť runs first when using @time) Could you verify my results?

Confirm, I must have been using code before all optimizations

``````julia> @time result1 = rolling_optima(x, K, debug=true)[3];
0.000625 seconds (26 allocations: 203.266 KiB)

julia> @time result2 = movingmax(x, K);
0.000341 seconds (10.01 k allocations: 546.469 KiB)
``````

So I think you are pretty fast now, cause your code is doing much more things.

This is very nice to hear! And you are right, my function is doing more. Do you have a feeling that your function is close to what is possible in terms of speed? Itâ€™s actually very satisfying to se how few allocations my code is currently performing!

In terms of complexity my code (your also) is optimal - O(N). Anyway, I think much more could be improved in my function, for instance when I changed LinkedList to Deque I am 15% faster.

Very nice! Did you get rid of some of the allocations? Maybe the part of my code that does the simple rolling maximum is very similar to your code after thatâ€¦?

Yes, alocations are much lower

``````julia> @btime result1 = rolling_optima(\$x, \$K, debug=true)[3];
439.911 ÎĽs (16 allocations: 202.89 KiB)

julia> @btime result2 = movingmax(\$x, \$K);
243.229 ÎĽs (10003 allocations: 546.31 KiB)

# Now using Deque
julia> @btime result2 = movingmax(\$x, \$K);
208.768 ÎĽs (4 allocations: 93.70 KiB)
``````
1 Like

Just for comparison I reduced my function to a `rolling_maxmin()` and a `rolling_max()` one. The last one of course loses some tricks but should be very close to yours?

``````
function rolling_maxmin(vect::Any, win_size::Integer)

roll_max = zeros(Float64, length(vect))
roll_min = zeros(Float64, length(vect))
d_max = Deque{Float64}()
d_min = Deque{Float64}()
pushfirst!(d_max,vect[1])
pushfirst!(d_min,vect[1])
@inbounds for i in 2:length(vect)

if vect[i] > vect[i-1]
@inbounds for k in 1:length(d_max)
if vect[i] >= first(d_max) popfirst!(d_max)
else break end
end
else
@inbounds for k in 1:length(d_min)
if vect[i] <= first(d_min) popfirst!(d_min)
else break end
end
end
pushfirst!(d_max,vect[i])
pushfirst!(d_min,vect[i])
j = i + 1 - win_size
if j >= 1
roll_max[i] = last(d_max)
roll_min[i] = last(d_min)
if vect[j] == last(d_max)
pop!(d_max)
elseif vect[j] == last(d_min)
pop!(d_min)
end
end
end
return roll_max, roll_min
end

function rolling_max(vect::Any, win_size::Integer)

roll_max = zeros(Float64, length(vect))
d_max = Deque{Float64}()
pushfirst!(d_max,vect[1])
@inbounds for i in 2:length(vect)

if vect[i] > vect[i-1]
@inbounds for k in 1:length(d_max)
if vect[i] >= first(d_max) popfirst!(d_max)
else break end
end
end
pushfirst!(d_max,vect[i])
j = i + 1 - win_size
if j >= 1
roll_max[i] = last(d_max)
if vect[j] == last(d_max)
pop!(d_max)
end
end
end
return roll_max
end
``````