I agree it should be removed but I don’t think it should be replaced. There are too many functions in Base, and this one is definitely confusing. If it’s gonna be anywhere, it should be in a package.
basically, given these four operations:
# in v1.7 beta
julia> a = [1,3,2];
julia> findmax(a) #1
(3, 2)
julia> findmax(inv, a) #2
(1.0, 1)
julia> argmax(a) #3
2
julia> argmax(inv, a) #4
1
we want to be able to get these pieces directly:
- the biggest value in the data (, #1)
- the index of 1. (, #1 or #3)
- the biggest evaluated value in data after evaluation. ( , #2 )
- the index of 3. (, #2)
- the original value of 3. (, #4)
so I’m fine with this I guess.
Yes, this is the lesson from all of these discussions: once the obvious core functionality is taken care of, API design is becomes increasingly harder and involves iterating on interfaces and learning from mistakes… which is practically impossible or very, very difficult with Base
(and the standard libraries), since once something end up there it has to be supported.
People mean well when they add new functionality into Base
, but the bar should be much higher for that than it is currently. IMO politely redirecting all such additions to a package (if feasible) should be the default.
You are wrong. Read the wikipedia page: Arg max - Wikipedia
It is unambiguous that argmax takes a function and a set, and returns an element of the set (or, all such elements, but programming languages tend not to do that). I don’t care about argmax(A)
; we could remove that as far as I’m concerned. We just can’t now since it would be breaking. (But note numpy has it, with the same meaning.)
There is no reason to think certain identities need to hold between argmax and findmax. Where is that coming from? A few bad definitions in packages do not overturn a long-standing mathematical convention.
Thanks for your friendly and welcome reaction!
This must be the most common argument given by those who come to this thread for the first time. And it was already addressed multiple times, not only by myself.
For example:
However, this is only true in isolation. For now, argmax in julia is the function with the only purpose: find index of max in the array. I don’t believe any amount of documentation can change this perception if argmax(A) remains: reaching for it is something many users would try even without reading any docs. And it works! Until they try to apply another pattern learnt from julia, that about reductions-taking-another-function: they may get a reasonable-looking result, but it would be wrong!
None of my arguments are based on identities between argmax and findmax, I believe. Instead, the consistency is between argmax(A)
and argmax(f, A)
: all other reductions in julia follow red(f, A) == red(f.(A))
if both methods are defined. Even argmax
’s close cousins findfirst/findmax/maximum
.
I just noticed that your are also present in the slack discussion, so probably better to continue there. It was just not clear for me at first, where the best place is.
@aplavin for what it’s worth, I think a few people are starting to get annoyed with the frequency and repetitiveness of the points you’re raising here any in other channels. I don’t know if that’s what coloured Jeff’s comment, but if you’re detecting people being impatient with you, I think this is a likely cause. It feels like your default assumption here when someone disagrees with you, is that they didn’t understand what you said, so you try to say it again. Sometimes though, people just have different priorities or understandings that cause them to reach different conclusions.
You’ve made your same point over and over again many times, in many places, and I think there’s a point where it becomes a bit inappropriate. I don’t know if you’ve reached that point yet, but I feel it’s getting close at least.
I think you’ve raised some good, valuable points here, but I think there’s a point where one has to just see what the decision makes decide after hearing your input, or open a PR to make the changes you want to see and accept the results.
I do not understand, can someone enlighten me please?
Read the wikipedia page: Arg max - Wikipedia
It is unambiguous that argmax takes a function and a set, and returns an element of the set (or, all such elements, but programming languages tend not to do that).
Arbitrary Vector
v is not a Set
, it is a special structure with mapping from indices to values, i.e. i \to v[i] . Or at least it can be treated as such a structure (and indeed we have getindex()
functions to support this mapping).
With that given, argmax(f, v::Vector)
can have two different, mathematically correct meanings:
- argmax(f, v) = \{v_0 \in v | f(v_0) = max f(v) \}
- argmax(f, v) = \{i_0 \in I | (f \circ \bar{v}) (i_0) = max f(v)\} where I is a set of indices and \bar{v}(i) defined as \bar{v}: i \to v[i].
So the question is really about definition of argmax over the functions composition, i.e. if we have set A and mappings f: A \to B and g: B \to C, then how one defines argmax
for g \circ f? It can be either argmax_{x \in f(A)} g or argmax_{x \in A} g \circ f. Both definitions are valid and do not contradict Wikipedia definition.
It looks like usability is defined by the context (what user defines as function domain)… It’s just that in case of Vector
one of the functions is implicit, but that doesn’t make it somehow wrong.
Interestingly enough, but it looks like argmax
is somewhat pathological function in that sense. For example, mean
or sum
do not have this ambiguity, i.e. sum_{x \in A} g \circ f == sum_{y \in f(A)} g (at least if f is surjection as in the case of getindex
). So, as a funny consequence of it, it may be possible that this whole time Julia developers were using second definition, while users thought that the first one is used. And only for argmax
this distinction became apparent.
simply because values(v)
in Julia returns something that has type name Vector
and happens to be index-able doesn’t mean it’s not compatible, in this context, with a mathematical Set, which only cares about values. In that sense, it is unambiguous: arg max returns the element from the “bag of values” (which are evaluated by function. There’s no need for another set of “index” for this definition
I agree this is unfortunate, but I just have to: no related decisions got changed before this topic, even the (really clear-cut!) findmax/findmin one. There was a GH issue before, but with no activity. Only after really reaching everyone around here that decision got changed and luckily we’ll have consistent findmax in 1.7.
It would be much better both for my general feelings and for julia if raising these extended discussions where many things get repeated would not be necessary - but we are where we are.
It feels like your default assumption here when someone disagrees with you, is that they didn’t understand what you said, so you try to say it again.
Clearly, this may be the case sometimes. But I find it hard to respond in another style when a person makes the very same argument already made several times, that I addressed. At least when doing so it would be great to directly refer to what is wrong in my previous explanations.
An interesting point you raised!
Speaking about sets, should we expect stuff like argmax(abs, Int16) == 32767
to work?
Because
A concrete type
T
describes the set of values whose direct tag, as returned by thetypeof
function, isT
.
Or maybe argmax(typemin, Real) == Bool
? Because
Abstract types cannot be instantiated, and serve only as nodes in the type graph, thereby describing sets of related concrete types
Ah, sorry, I put it wrong. Of course Vector
is set! But it’s much more than just a set, it’s a set plus some additional structure. So my questions are “why this structure is ignored? why we are saying that this structure is wrong? why we are saying that this structure contradicts mathematical definition?”
I mean, Vector is more rich structure than Set in the same sense as group is a semigroup plus two additional axioms. The fact that group is a seimgroup doesn’t mean that we should just ignore group structure, right? We would lose half of the mathematics this way. But why would we want to do it?
It’s perfectly normal to have some theorem and constructions for semigroups only and at the same time we have theorems which utilize group structures. There is nothing wrong with that.
I am not joking, I am really do not understand what is the problem, sorry.
I will give one reason first: because we want argmax
to work with actual Set
and in genera iterators
Without re-reading the entire thread (I know I made some very very good points way up there somewhere), I think my observation is that most people appear to agree that the optimal design here cannot be done without breaking changes (especially regarding argmax(A)
). I feel it is also important to point out that the severity of the issue seems rather low to me. So, the cost / barriers to change are quite substantial yet the payoff not too big…
I agree with a lot of what @aplavin is saying (via my earlier posts), but I also think that we can live with a findmax(f,A)
definition in Julia 1.x which is counterintuitive to some people (but not all) as long as it is clearly documented. It isn’t broken or wrong in that sense. Perhaps the entire set of findmax
and friends should be revisited come 2.0 though.
Thanks to this very topic, the findmax
inconsistency issue is already resolved! (:
I mean, there are still major differences between e.g. findfirst
and findmax
interface, but they are historical artifacts and not in scope here - for better or worse. This topic and issue only address the new behavior to-be-introduced in 1.7.
About a year has passed, and I got curious how the updated argmax
+ behavior is supported across the ecosystem. Turns out, even the most popular packages still get the function wrong!
For example, CSV.jl:
function csv_table(n)
buf = IOBuffer()
CSV.write(buf, (x=n:-1:1,))
seekstart(buf)
CSV.File(buf) |> columntable
end
# correct: the first element is the largest
julia> argmax(csv_table(10).x)
1
# wrong: the first element is still the largest
julia> argmax(csv_table(10000).x)
10000
Note, this bug is in the argmax(A)
method, but it was introduced when changing argmax(f, A)
to work like it does in julia 1.7. CSV versions older than a year don’t have this issue.
I’m not writing this post to point towards CSV.jl specifically, this is just a good illustration of how weird the new argmax behavior actually is. Even CSV with a large userbase and good testing didn’t catch that argmax does the wrong thing for a year.
Meanwhile, the sibling findmax
seems to be correctly implemented in all packages that do it. Makes sense, findmax’s 1- and 2-arg methods are consistent with how basically everything else works in Julia.
I don’t know what a SentinelArrays.ChainedVector{Int64, Vector{Int64}}
is, but there seems to be a workaround:
julia> argmax(csv_table(10000).x)
10000
julia> argmax(csv_table(10000).x, dims=1)[]
1
And now, three years after this thread and two years after the error was reported, that specific CSV bug is finally fixed (thanks @mkitti for pushing it recently!)
The confusing and error-prone argmax(X)
vs argmax(f, X)
behavior (introduced in 1.7) was only a part of the story, but I think a significant one. Testing is never perfect and complete, so it’s best to avoid introducing APIs with a strong potential for mistakes.
Anyway, now Julia has to live with this interface, even though it was noted as inconsistent at least several times during these years…