Can `findmin` use a counter if `keys` is not defined for an iterator?

jishnub · January 17, 2023, 5:53am

I want the index for which an iterator itr is minimized. I had expected

_, ind = findmin(f, itr)

to return the index, but this doesn’t work if keys(itr) is not defined. What does work is:

ind, _ = argmin(((ind, v),) -> f(v), enumerate(itr))

To take an example:

julia> z = zip([2,4], [6,1]);

julia> findmin(z) do (a,b)
           a + b
       end
ERROR: MethodError: no method matching keys(::Base.Iterators.Zip{Tuple{Vector{Int64}, Vector{Int64}}})

julia> argmin(enumerate(z)) do (ind, (a,b))
           a + b
       end
(2, (4, 1))

I understand that findmin returns the index of a value in the domain, whereas argmin returns the value in the domain for which the argument is minimized. However, this sounds like splitting hairs over what the definition of an index is, and whether an index means anything unless you may use it to index into the domain. What I want here is not really an index, but it may be referred to as a count, as I just want to run a counter and return the corresponding value for which the function is minimized. I’m assuming that even though the domain may not be indexed into, one may iterate over it. The argmin variant achieves exactly this. My question is, can findmin be made to behave analogously, or is it reserved for iterators where the domain supports indexing?

Related: findmin with filtered iterators fail · Issue #47124 · JuliaLang/julia · GitHub

sijo · January 17, 2023, 10:24am

That’s the conceptual difference between findmin and argmin: do you want an index in the domain, or a value of the domain? I don’t think that’s splitting hairs. In your case the domain is not indexable so I think it’s neat that you can still do what you want by using argmin with a purpose-built domain that includes the “count”.

Maybe what’s missing from Base is a simple way to wrap an iterator in a type that defines pairs/keys, for example

julia> Base.pairs(it::Iterators.Enumerate) = Iterators.map(Base.splat(=>), it);

julia> z = zip([2,4], [6,1]);

julia> findmin(enumerate(z)) do (a,b)
           a + b
       end
(5, 2)

rafael.guerra · January 17, 2023, 12:43pm

The following seems to produce the same result faster:

findmin(x -> x[1] + x[2], collect(z))

Does it make any sense?

jishnub · January 17, 2023, 12:52pm

collect works, but I was hoping that I won’t need to materialize the iterator

rafael.guerra · January 17, 2023, 12:55pm

Alright, but where does the materialization occur in @sijo’s code, as it is allocating more?

jishnub · January 17, 2023, 1:10pm

That’s the sort of thing that I had in mind, but I wonder if pairs allows this? The docstring says

Return an iterator over key => value pairs for any collection that maps a set of keys to a set of values.

enumerate doesn’t exactly map keys to values, it allows iterating over both parallelly. In a loose sense, though, there does exist a mapping.

sijo · January 17, 2023, 1:44pm

I think the splat is slow, this seems faster than collect:

z = zip([2,4], [6,1])

Base.pairs(it::Iterators.Enumerate) = Iterators.map(it) do (i, el)
   i => el
end

findmin(x -> x[1] + x[2], enumerate(z))

sijo · January 17, 2023, 1:47pm

jishnub:

That’s the sort of thing that I had in mind, but I wonder if pairs allows this? The docstring says
Return an iterator over key => value pairs for any collection that maps a set of keys to a set of values.
enumerate doesn’t exactly map keys to values, it allows iterating over both parallelly. In a loose sense, though, there does exist a mapping.

I agree it’s not clear cut, but for me the output of enumerate is indeed a mapping (I think the doc here is using plain English, not a formal definition of “mapping” that would require indexing operation to work).

Also it seems hard to argue that Generator can have keys but Enumerate should not…

rafael.guerra · January 17, 2023, 1:56pm

Brilliant, it is faster and with 0-allocations now

sijo · January 17, 2023, 4:49pm

For reference, PR submitted here.

rocco_sprmnt21 · January 18, 2023, 6:24pm

I confess that I have entered an area that I know little about.
But what do you think of the following idea?
I did some tests and it seems to give the expected result in times comparable to the other proposals.
But I’m not sure if that’s a generally valid solution, beyond the example tested.
Any comments are welcome in order to better understand how things are going.

Base.keys(it::Iterators.Zip) = Iterators.map( identity, 1:length(it))

# Base.keys(it::Iterators.Zip) = Iterators.map( identity, size(it))

jishnub · January 19, 2023, 4:49am

I guess keys don’t make sense for a Zip, as there isn’t a direct key-value relationship. It makes more sense for an Enumerate, although even there the key is a counter, and not something that you may use to index into the parent iterator

sijo · January 19, 2023, 7:45am

Yes I think keys make more sense for Enumerate as the enumeration assigns a kind of ID to each value. And it makes sense that you can wrap any iterator with enumerate to gain this functionality. Otherwise you would have to define methods for all iterators. Or to also cover user iterators you would need to define a generic fallback Base.keys(it), which was turned down here for good reasons I think.

Also for the problem in this thread, defining only keys wouldn’t work for iterators that have no length defined, while enumerating the pairs always works.

Topic		Replies	Views
Findmin/max and generators General Usage	13	1023	October 11, 2022
Why does Base.findmin assume that Combinatorics.Permutations has a `keys` method? General Usage combinatorics	3	343	June 14, 2022
Find minimum argument with custom comparator General Usage question	6	1297	October 1, 2020
Why is minimum so much faster than argmin? Performance	11	2032	August 23, 2021
Minimum by key function New to Julia question	2	1112	February 25, 2021

Can `findmin` use a counter if `keys` is not defined for an iterator?

Related topics