# Find filtered min/max index

Every now and then I need to find the index of the min/max element of a filtered data structure.
I was wondering, how do you normally tackle this quite common problem?

As a first reaction I think of `findmin(o, filter(f, data))` but filtering doesn’t hold the original indices.
I ended up using `findmin(o, skipmissing([f(d) ? d : misssing for d in data])` where I inject `missing `values in the values that should be filtered out.

MWE:

``````julia> using Random

julia> rng = MersenneTwister(0);

julia> data = [(rand(rng), rand(rng)) for _ in 1:10];

julia> dataobjective(d) = d[2];

julia> filterdata(d) = d[1] > 0.5;

julia> findmin(dataobjective, skipmissing([filterdata(d) ? d : missing for d in data]))
(0.5392892841426182, 6)
``````

It’s very convenient that `skipmissing` is propagating the original indices. Is there something like `skipfilter` that would do exactly as `skipmissing` but for elements satisfying a predicate ?
And how would you solve the above problem ?

Check out @aplavin’s Skipper.jl, which provides an arbitrary-predicate `skip` that should totally be in Base imo.

4 Likes
``````minimum(t for t in  zip(last.(data),eachindex(data)) if t[1]>0.5)
``````

the elegant @lmiq solution applied to the (modified )OP case

``````findmin(x -> x[2] > 0.5 ? x[2] : +Inf, data)
``````

You can use the predicate to filter, for example:

``````julia> data = collect(1:10)
10-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10

julia> findmin(x -> x > 3 ? x : +Inf, data)
(4, 4)
``````

or, similar to the OP example:

``````julia> data = [(rand(),rand()) for _ in 1:10];

julia> findmin(d -> d[1] > 0.5 ? d[2] : +Inf, data)
(0.42413123524471796, 8)
``````
1 Like

okay. I read the OP more carefully

``````minimum((t[1][2],t[2]) for t in  zip(data, eachindex(data)) if t[1][1] > 0.5)
``````

I also had problem finding it with google/bing. It appears they don’t parse gitlab that well as github.

1 Like

``````julia> withmissing = [filterdata(d) ? d : missing for d in data]

julia> findmin(map(withmissing) do x
ismissing(x) ? (true,0.0) : (false, dataobjective(x))
end)
((false, 0.5392892841426182), 6)
``````

This generalizes well to other cases, but needs a tiny addition to access the value. More importantly, it handles the all missing case which could potentially cause trouble.

Or, same as rocco_sprmnt21 's but perhaps more readable:

``````minimum( [ (dataobjective(x),i) for (i,x) in enumerate(withmissing)
if !ismissing(x) ] )
``````

That should definitely go to base. Why it is not?

Just checked it out:

``````julia> using Skipper

julia> data = collect(1:10);

julia> findmin(skip(x -> x < 4, data))
(4, 4)

julia> @btime findmin(skip(x -> x < 4, \$data))
21.599 ns (0 allocations: 0 bytes)
(4, 4)
``````
2 Likes

Aside from inverting the predicate, what is the difference between `skip` and `Iterators.filter`?

The intent of `filter` is to construct an entirely new collection excluding the filtered elements, and `Iterators.filter` is a lazy variant.

The intent of `skipmissing` (and `skip`) is different: we intend to continue working with the same collection, but we’re specifying that for this operation certain elements won’t be under consideration.

The example here is a good one for demonstrating the difference; because `filter` returns a brand new collection, the indices will be entirely different from the original collection. In reflection of this, `Iterators.filter` isn’t even indexable. `skip`, on the other hand, does what we want here.

``````julia> using Skipper
data = collect(1:10)
findmin(skip(<(4), data))
(4, 4)

julia> findmin(Iterators.filter(!<(4), data))
ERROR: MethodError: no method matching keys(::Base.Iterators.Filter{ComposedFunction{typeof(!), Base.Fix2{typeof(<), Int64}}, Vector{Int64}})
``````
1 Like

In other words, `skip` makes a view while `filter` and `Iterators.filter` makes a copy?

Sort of. I’d say `skip` creates a view of the original collection. In contrast, `filter` creates an entirely new (copied and modified) collection, and `Iterators.filter` creates a view of what the new collection will look like without actually collecting it.

When I think about it though, outside from the fact that we want generators to resemble comprehensions, I can’t think of any cases that you’d use `Iterators.filter` where you couldn’t use `skip`, but I can think of plenty of cases where you’d use `skip` where you couldn’t use `Iterators.filter`. Maybe a lazy `filter` was never quite the right idea to begin with?

Either that, or maybe the wrong idea is actually eager `filter`: by implicitly creating a new collection it has trained us to believe a filtered collection is a new collection—maybe `Base.filter` should actually be lazy and act like `skip` with its predicate inversed, and you’d have to explicitly call `collect` for the indices to change. That would be more efficient when building filter->map->reduce chains, and by preserving indices it’d also make `skip` unnecessary.

Definitely brings to mind @jakobnissen’s complaints about Julia’s FP primitives.

I don’t think anybody has opened an issue or PR.

2 Likes

I’ll just mention the opposite of `skip` could be `keep`. I know some people find the name “filter” confusing since it sounds like it’s filtering out but it’s the opposite.

1 Like

Would it be possible to transition Julia’s generator and comprehension syntax to lower to a `Keep` type instead of a `Filter` type?

If anything could justify a Julia 2.0, it might be the `Base` FP primitives. They’re FUBAR and guide people into all sorts of performance traps. Hard to claim your language is performant, if the base primitives guide the ecosystem into making slow code. And for a language whose raison d’être is performance, that’s a matter of existential import.

As an aside, `collect` is FUBAR too. Its current role should be satisfied by the `Array` constructor, and then `collect` could be used for collecting all sorts of other collection types.

1 Like

If that happens, I’d also consider changing the Base functions to enable error handling and possibly passing memory allocators.

To find only the index, there is `argmin`.
Using the functions in the OP:

``````# 22 ns (0 allocs: 0 bytes)
argmin(filterdata(d) ? dataobjective(d) : +Inf for (_,d) in pairs(data))
``````
1 Like

`Iterators.filter` is also lazy. The difference is that `filter`s don’t keep the original indices, while `skipmissing` in `Base` and `skip` in `Skipper` do keep. `filter` cannot keep original indices by design.

1 Like

regarding performance the following version

``````last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), data, init=(0,Inf,0)))
``````

seems to rival even a for loop form

``````

julia> @btime last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), \$data, init=(0,Inf,0)))
10.000 ns (0 allocations: 0 bytes)
3

julia> @btime argmin(i -> filterdata(\$data[i]) ? dataobjective(\$data[i]) : +Inf, axes(\$data,1))
23.370 ns (0 allocations: 0 bytes)
3

julia> data = [(rand(rng), rand(rng)) for _ in 1:10^6];

julia> @btime argmin(i -> first(\$data[i])>0.5 ? last(\$data[i]) : +Inf, axes(\$data,1))
4.612 ms (0 allocations: 0 bytes)
669781

julia> @btime last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), \$data, init=(0,Inf,0)))
1.784 ms (0 allocations: 0 bytes)
669781

julia> @btime argmin(i -> filterdata(\$data[i]) ? dataobjective(\$data[i]) : +Inf, axes(\$data,1))
4.668 ms (0 allocations: 0 bytes)
669781
``````
``````function ffm(arr)
i,im, min=0, 0, Inf
for e in arr
i+=1
if e[1]>.5 && e[2]<min
im, min = i, e[2]
end
end
im
end
julia> @btime ffm(\$data)
1.804 ms (0 allocations: 0 bytes)
669781
``````

Is there a running list of things that should be fixed in 2.0? And is there a Julia fork that would allow package devs to start porting their code over to it?

If you were designing this interface from scratch, starting with the lazy methods instead of the eager methods, would you make `filter` keep the original indices?