ANN: MaxMinFilters.jl - fast streaming maximum / minimum within moving window

Thank you for the package. Indeed, it would be nice to see a generalization of all these efforts into a single even more powerful package. This is my current list for future reference:

4 Likes

It really has to be a community effort. In other similar cases, I believe that either the parties come together and agree on some standards, often creating a <RelvantDomain>Base.jl package with abstract types and maybe a defined API, which others then build off of.

Alternatively, one opinionated and prolific coder can create the standard and blow everyone away with how good it is, such that an ecosystem builds up around them and becomes the de facto standard cough Rackauckas cough.

The first way is definitely easier, if you can get folks on board.

3 Likes

I support using the best, most performant, generally applicable approaches. So, as a first step, what of your implementation is appropriate to other windowed functionality? Are the ring buffers more facile and faster than viewing sliding indices?

It’s not just about ring buffers, but the algorithm itself - when you move your window 1 point forward, you can use previous calculation result and do not repeat calcularions for window-1 previous samples. This applies to many functions, such as max, min, sort, median, mean, std, and so on.
Also, I think there should be added stateful filters for all windowed functions, so it can be used for streaming processing when new data are available in chunks.

Any references to those next-step refining stats?

Here is a minimal example of running mean, the same applies to higher order stats:

using Statistics

function runmean1(x::Vector{T}, n::Int64=10)::Vector{T} where {T<:Real}
    len = size(x,1)
    @assert n<len && n>1 "Argument n is out of bounds."
    out = zeros(len - n + 1)
    @inbounds for i = n:len
        out[i-n+1] = mean(view(x, i-n+1:i))
    end
    return out
end

function runmean2(x::Vector{T}, n::Int64=10)::Vector{T} where {T<:Real}
    len = size(x,1)
    @assert n<len && n>1 "Argument n is out of bounds."
    out = zeros(len - n + 1)
    sum::T = 0
    @inbounds for i = 1:n
        sum += x[i]
    end
    out[1] = sum / n
    @inbounds for i = n+1:size(x,1)
        sum += x[i] - x[i-n] # sum = previous sum - old point + new point
        out[i-n+1] = sum / n
    end
    return out
end

x = randn(1000000)
w = 100
@time m1 = runmean1(x, w)
@time m2 = runmean2(x, w)
m1 ≈ m2

For nan-stable version you can see my last PR to Indicators.jl:

https://github.com/dysonance/Indicators.jl/pull/20/files

Just FYI, you may be interested in transducers instead of stateful functions, e.g., if you want to squeeze out performance. See: GitHub - JuliaFolds/Transducers.jl: Efficient transducers for Julia / Clojure - Transducers

6 Likes

6 posts were split to a new topic: Using Transducers.jl instead of stateful functions for stream processing

Late to the party here. The algorithm in MaxMinFilters is the same one implemented in ImageFiltering; it just turned out the one in ImageFiltering had been optimized for small window sizes. A small tweak fixed the performance for large windows, so that now its performance is similar to MaxMinFilters.

ImageFiltering also supports delays, supports multiple dimensions, implements boundary conditions, and supports arbitrary functions. I think it’s a reasonable candidate for your “one-size-fits-all” package described in the OP, and I agree that we probably don’t need so many different packages that do the same thing.

6 Likes

Thanks, I have updated benchmark plot for ImageFiltering with that issue fixed - it has constant time now.

What delay are you talking about? I didn’t see any delay notations in ImageFiltering. Is it about filtering image sequence (video stream)?

Regarding to one general package, one impartant case is chunk-based stream processing for growing data with the ability to store filter state until the next data chunk arrives. I don’t think that applies to images, because you don’t usually have, say, 256x256 image that grows in its first dimension (512x256, 768x256, etc.).

1 Like

What delay are you talking about? I didn’t see any delay notations in ImageFiltering.

In the window specification you can supply a range, and that range doesn’t even have to include 0:

julia> mapwindow(extrema, 1:10, (-3:-1,), border="replicate")
10-element Array{Tuple{Int64,Int64},1}:
 (1, 1)
 (1, 1)
 (1, 2)
 (1, 3)
 (2, 4)
 (3, 5)
 (4, 6)
 (5, 7)
 (6, 8)
 (7, 9)

one impartant case is chunk-based stream processing for growing data

True, but we routinely analyze images (movies) that are multiple terabytes where the last dimension is huge. Between Mmap and things like StreamingContainer I think we have almost everything one needs. I think the only missing component would be to split https://github.com/JuliaImages/ImageFiltering.jl/blob/5bcbe300a9af56b85a773b0edb156f72fa6f0594/src/mapwindow.jl#L408 so that it can take the Wedge W as input and return the W and cache upon output. That would allow you to preserve the state if you do need to make multiple calls to handle multiple chunks.

Maybe it’s already mentioned elsewhere, but I find it a bit of a discoverability issue that ImageFiltering contains many things that are useful outside image processing. For example, if I am looking for a function to process a vector, it’s tricky for me to think about ImageFiltering (unless I already know it).

1 Like

Yeah, we should probably rename it ArrayFiltering.jl or something. Would that fix the problem?

7 Likes

Yes, I think it’d make non-image processing people interested in the package more.

We’ve gotten to the point where an image is just an arbitrary-dimensional array where the eltype is <:ColorType. Other than that, JuliaImages is basically an extension of JuliaArrays, one where the notion of “index locality” really matters (i.e., entries with nearby indexes are in some sense related to one another). While some of the manipulations are pretty image-specific (e.g., ImageSegmentation, ImageFeatures), a lot of the material there can be applied to a much wider class of arrays. It makes it a little tricky to decide how to package things.

4 Likes

There is also some increased code complexity/readability, when you want to write generic functions for any dimensionality. And also, when you want to run along different dimension.

Sure, but we’ve worked quite hard to make sure that multidimensional code is easy to write, and very readable, in Julia (Multidimensional algorithms and iteration). Is the small extra complexity of handling arbitrary indexes and arbitrary dimensionality really worth a whole bunch of packages that duplicate functionality?

2 Likes

Yeah, there’s a lot of overlap in DSP.jl’s conv / filt / resample functions - it would be good to do a pass through to see where they should be merged together.

1 Like

I agree. The onus should probably be on me to get that rolling: I’ve been talking about splitting/renaming ImageFiltering for some time, and the package is anyway due for an overhaul to take advantage of some of the nice features of modern Julia versions.

3 Likes

I think, when writing some new functions, it is easier to start with 1-dimensional case for simplicity, then debug it, and then it can be rewritten to multi-dimensional case, if applicable. And if there are some notable performance differences OR extra code complexity, we can leave both implementations.

It’s nice to have one implementation for 2 and more dimensions. But combining 1 and many has some extra burden, e.g. in DataStructures we have 1-dimensional CircularDeque, and there is no need to rewrite it for multidimensional deque with pushes along selected dims. If we really need that, it can be just added in separate source file.