Find filtered min/max index

filchristou · February 24, 2023, 5:16pm

Every now and then I need to find the index of the min/max element of a filtered data structure.
I was wondering, how do you normally tackle this quite common problem?

As a first reaction I think of findmin(o, filter(f, data)) but filtering doesn’t hold the original indices.
I ended up using findmin(o, skipmissing([f(d) ? d : misssing for d in data]) where I inject missing values in the values that should be filtered out.

MWE:

julia> using Random

julia> rng = MersenneTwister(0);

julia> data = [(rand(rng), rand(rng)) for _ in 1:10];

julia> dataobjective(d) = d[2];

julia> filterdata(d) = d[1] > 0.5;

julia> findmin(dataobjective, skipmissing([filterdata(d) ? d : missing for d in data]))
(0.5392892841426182, 6)

It’s very convenient that skipmissing is propagating the original indices. Is there something like skipfilter that would do exactly as skipmissing but for elements satisfying a predicate ?
And how would you solve the above problem ?

uniment · February 24, 2023, 5:53pm

Check out @aplavin’s Skipper.jl, which provides an arbitrary-predicate skip that should totally be in Base imo.

rocco_sprmnt21 · February 24, 2023, 6:11pm

minimum(t for t in  zip(last.(data),eachindex(data)) if t[1]>0.5)

the elegant @lmiq solution applied to the (modified )OP case

findmin(x -> x[2] > 0.5 ? x[2] : +Inf, data)

lmiq · February 24, 2023, 6:11pm

You can use the predicate to filter, for example:

julia> data = collect(1:10)
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

julia> findmin(x -> x > 3 ? x : +Inf, data)
(4, 4)

or, similar to the OP example:

julia> data = [(rand(),rand()) for _ in 1:10];

julia> findmin(d -> d[1] > 0.5 ? d[2] : +Inf, data)
(0.42413123524471796, 8)

rocco_sprmnt21 · February 24, 2023, 6:40pm

okay. I read the OP more carefully

minimum((t[1][2],t[2]) for t in  zip(data, eachindex(data)) if t[1][1] > 0.5)

jar1 · February 24, 2023, 7:14pm

Link?

filchristou · February 24, 2023, 7:22pm

I also had problem finding it with google/bing. It appears they don’t parse gitlab that well as github.

Dan · February 24, 2023, 7:24pm

How about:

julia> withmissing = [filterdata(d) ? d : missing for d in data]

julia> findmin(map(withmissing) do x 
    ismissing(x) ? (true,0.0) : (false, dataobjective(x)) 
  end)
((false, 0.5392892841426182), 6)

This generalizes well to other cases, but needs a tiny addition to access the value. More importantly, it handles the all missing case which could potentially cause trouble.

Or, same as rocco_sprmnt21 's but perhaps more readable:

minimum( [ (dataobjective(x),i) for (i,x) in enumerate(withmissing)
          if !ismissing(x) ] )

lmiq · February 24, 2023, 7:44pm

That should definitely go to base. Why it is not?

Just checked it out:

julia> using Skipper

julia> data = collect(1:10);

julia> findmin(skip(x -> x < 4, data))
(4, 4)

julia> @btime findmin(skip(x -> x < 4, $data))
  21.599 ns (0 allocations: 0 bytes)
(4, 4)

jar1 · February 24, 2023, 7:49pm

Aside from inverting the predicate, what is the difference between skip and Iterators.filter?

uniment · February 24, 2023, 8:06pm

The intent of filter is to construct an entirely new collection excluding the filtered elements, and Iterators.filter is a lazy variant.

The intent of skipmissing (and skip) is different: we intend to continue working with the same collection, but we’re specifying that for this operation certain elements won’t be under consideration.

The example here is a good one for demonstrating the difference; because filter returns a brand new collection, the indices will be entirely different from the original collection. In reflection of this, Iterators.filter isn’t even indexable. skip, on the other hand, does what we want here.

julia> using Skipper
       data = collect(1:10)
       findmin(skip(<(4), data))
(4, 4)

julia> findmin(Iterators.filter(!<(4), data))
ERROR: MethodError: no method matching keys(::Base.Iterators.Filter{ComposedFunction{typeof(!), Base.Fix2{typeof(<), Int64}}, Vector{Int64}})

jar1 · February 24, 2023, 8:16pm

In other words, skip makes a view while filter and Iterators.filter makes a copy?

uniment · February 24, 2023, 9:11pm

Sort of. I’d say skip creates a view of the original collection. In contrast, filter creates an entirely new (copied and modified) collection, and Iterators.filter creates a view of what the new collection will look like without actually collecting it.

When I think about it though, outside from the fact that we want generators to resemble comprehensions, I can’t think of any cases that you’d use Iterators.filter where you couldn’t use skip, but I can think of plenty of cases where you’d use skip where you couldn’t use Iterators.filter. Maybe a lazy filter was never quite the right idea to begin with?

Either that, or maybe the wrong idea is actually eager filter: by implicitly creating a new collection it has trained us to believe a filtered collection is a new collection—maybe Base.filter should actually be lazy and act like skip with its predicate inversed, and you’d have to explicitly call collect for the indices to change. That would be more efficient when building filter->map->reduce chains, and by preserving indices it’d also make skip unnecessary.

Definitely brings to mind @jakobnissen’s complaints about Julia’s FP primitives.

I don’t think anybody has opened an issue or PR.

jar1 · February 24, 2023, 9:18pm

I’ll just mention the opposite of skip could be keep. I know some people find the name “filter” confusing since it sounds like it’s filtering out but it’s the opposite.

uniment · February 24, 2023, 10:22pm

Would it be possible to transition Julia’s generator and comprehension syntax to lower to a Keep type instead of a Filter type?

If anything could justify a Julia 2.0, it might be the Base FP primitives. They’re FUBAR and guide people into all sorts of performance traps. Hard to claim your language is performant, if the base primitives guide the ecosystem into making slow code. And for a language whose raison d’être is performance, that’s a matter of existential import.

As an aside, collect is FUBAR too. Its current role should be satisfied by the Array constructor, and then collect could be used for collecting all sorts of other collection types.

jar1 · February 24, 2023, 10:32pm

If that happens, I’d also consider changing the Base functions to enable error handling and possibly passing memory allocators.

rafael.guerra · February 25, 2023, 11:06am

To find only the index, there is argmin.
Using the functions in the OP:

# 22 ns (0 allocs: 0 bytes)
argmin(filterdata(d) ? dataobjective(d) : +Inf for (_,d) in pairs(data))

aplavin · February 25, 2023, 4:45pm

Iterators.filter is also lazy. The difference is that filters don’t keep the original indices, while skipmissing in Base and skip in Skipper do keep. filter cannot keep original indices by design.

rocco_sprmnt21 · February 25, 2023, 5:03pm

regarding performance the following version

last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), data, init=(0,Inf,0)))

seems to rival even a for loop form



julia> @btime last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), $data, init=(0,Inf,0)))
  10.000 ns (0 allocations: 0 bytes)
3

julia> @btime argmin(i -> filterdata($data[i]) ? dataobjective($data[i]) : +Inf, axes($data,1))
  23.370 ns (0 allocations: 0 bytes)
3

julia> data = [(rand(rng), rand(rng)) for _ in 1:10^6];    

julia> @btime argmin(i -> first($data[i])>0.5 ? last($data[i]) : +Inf, axes($data,1))
  4.612 ms (0 allocations: 0 bytes)
669781

julia> @btime last(reduce((s,c)->(c[1]>.5 && c[2]<s[2]) ? (s[1]+1,c[2],s[1]+1) : (s[1]+1, s[2],s[3]), $data, init=(0,Inf,0)))
  1.784 ms (0 allocations: 0 bytes)
669781

julia> @btime argmin(i -> filterdata($data[i]) ? dataobjective($data[i]) : +Inf, axes($data,1))
  4.668 ms (0 allocations: 0 bytes)
669781

function ffm(arr)
    i,im, min=0, 0, Inf
    for e in arr
        i+=1
        if e[1]>.5 && e[2]<min
            im, min = i, e[2]
        end
    end
    im
end
julia> @btime ffm($data)
  1.804 ms (0 allocations: 0 bytes)
669781

uniment · February 26, 2023, 11:12am

Is there a running list of things that should be fixed in 2.0? And is there a Julia fork that would allow package devs to start porting their code over to it?

If you were designing this interface from scratch, starting with the lazy methods instead of the eager methods, would you make filter keep the original indices?

Topic		Replies	Views
Filter of skipmissing fails General Usage	7	848	February 27, 2019
Two questions about filtering General Usage question	3	766	February 18, 2018
Findmin/max and generators General Usage	13	1071	October 11, 2022
Findfirst that returns the value/element New to Julia	16	3931	January 23, 2018
Finding Position of Element in an Array General Usage question	46	106311	October 9, 2019

Find filtered min/max index

Related topics