Extend the domain of maximum and minimum to include empty collection?

question
proposal

#1

When I ask for the maximum of an empty collection, there is no obvious result in general. For a collection of real numbers, nevertheless I would expect to obtain -Inf and not an exception.
Boiling down my use-case gave:

julia> maximum(Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _mapreduce(::Base.#identity, ::Base.#scalarmax, ::IndexLinear, ::Array{Float64,1}) at ./reduce.jl:265
 [2] maximum(::Array{Float64,1}) at ./reduce.jl:454
 [3] macro expansion at ./REPL.jl:97 [inlined]
 [4] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

Similar results are obtained for minimum(Float64[]). Interestingly

julia> maximum(abs, Float64[])
0.0

works fine.
I found that I could tweak the behavior to the desired by defining adding methods

mr_empty(::typeof(identity),::typeof(scalarmax),::Type{T}) where T<:Real = typemin(T)
mr_empty(::typeof(identity),::typeof(scalarmin),::Type{T}) where T<:Real = typemax(T)

overriding the standard mr_empty(f, op, T) = _empty_reduce_error() of the Base module, defined in base/reduce.jl

Are there any objections in adding code similar to my proposed lines into base/reduce.jl ?


#2

For floating-point types, returning -Inf for the max of an empty collection seems reasonable. For integer types, returning typemin seems more questionable to me, as depending on the width of the type seems more like an implementation artifact than a mathematical result.


#3

Thanks for your response. I agree, that for AbstractFloat types, the suggested return values (-Inf resp. Inf) seem more reasonable that the typemin resp. typemax outputs for other Real types.
In the meanwhile, I think, the problem is even more general, I restrict to the maximumcase for now. In the minimumcase so analogous.

maximum(itr) should IMHO mimic the behavior of reduce(max, v0, itr), where it tries to guess the type and value of v0 depending on eltype(itr) if itr is empty. “v0 must be a neutral element for op that will be returned for empty collections.” according to the documentation of reduce. The neutral element of max is any lower bound on the type on which max operates. Now lower bounds are not implemented for all kinds of types, which implement max or <. So in the moment typemin is the best guess we have in the general case.
We could also introduce a new function like lowerbound or so, which explicitly provide the value per type.
Here a list of lower bounds for some types, not restricted to Real:

Type lower bound upper bound
Int32 ? ?
UInt32 UInt32(0) ?
Float16 -Inf16 Inf16
BigInt no representation of -∞ provided no representation of ∞ provided
BigFloat typemin(BigFloat) typemax(BigFloat)
Rational{T} -one(T)//zero(T) one(T)//zero(T)
String “” nothing
Set{T} Set(T[]) set of all elements of type T

The ? means: typemax returns a value, which may be considered an implementation artefact but does not represent the bound of a mathematical idealization. BigInt suffers from the missing implementation of infinities in GMP.
The Set example is arguable because <(x::Set,y::Set) implements the usual order relation for sets, but max(x::Set,y::Set}) should not be ifelse( y < x, x, y) but rather union(x,y).


#4

Another idea:

Interpret the docs of reduce consequently, which says:
"reduce(op, v0, itr)

Reduce the given collection ìtr with the given binary operator op. v0 must be a neutral element for op that will be returned for empty collections. It is unspecified whether v0 is used for non-empty collections.

Reductions for certain commonly-used operators have special implementations which should be used instead: maximum(itr), minimum(itr), sum(itr), prod(itr), any(itr), all(itr)

Unfortunately the optional v0 of reduce gets lost. So would it make sense to introduce a (keyword-) argument for maximum and minimum to delegate the decision about the neutral element to the user?

BTW: sum(itr), prod(itr), any(itr), all(itr) all do provide the neutral elements for their cases by means of mr_empty, so why not maximum?


#5

It would be a bit weird if maximum(itr) < minimum(itr)? And if maximum(itr) = a, but a !in itr.

Isn’t this a job for some sort of Nullable?


#6

No, that is not weird; at least if you are a mathematican!
Of course if itr is not empty, that would be, but not if itr is empty. Reason is, that max(-Inf, a) == a and min(Inf, a) == a for all a real a. You could also say "-Inf is the neutral element of the operator max on real types.
The following sentence should be easily acceptable:
“If you remove a number from a collection, the collection’s maximum should not grow.”

From this you can logically derive, that maximum(empty) == -Inf as you see here:
Let a::T be a value of type T. T should have a total ordering and < properly defined.
If you remove a from Set([a]) you get Set{T}[]. Then, by the accepted condition, maximum(Set{T}[]) <= maximum(Set[a]) == a for every a, from that we conclude maximum(Set{T}[]) <= a for all a of the Type T. So maximum(Set{T}[]) must be a lower bound of T. For numerical types the only defined lower bound is -Inf.
The same argument leads to minimum(Set{T}[]) == +Inf`.

The second question is if maximum(X) ∈ X is necessary. Again, for non-empty totally ordered finite sets X that is true. The general case needs generalization. For example, the set of all rational numbers, whose square is below 2 does not contain its maximum, because sqrt(2) is not rational. And maximum({}) is not contained in {}, because {} does not contain any element.


#7

There is no contradiction between ‘mathematically correct’ and ‘weird’. I do understand your argument, but maximum(itr) < minimum(itr) is seriously weird, and is likely to create some big surprises.


#8

That may be true, but why not accept weird results as a part of reality? I say that, as I am a mathematician. In my opinion mathematics lives amongst other things from counter-examples and surprises at the first sight.
So you pledge for “it does not make sense to define maximum for non-empty iterators, because some people could be surprised by the correct result”?

BTW, I was extremely surprised, when my call to maximum(empty) did not return -Inf. So the seriousity level of weirdness seems extremely subjective.


#9

maximum(itr) does explicitly return "the largest element of itr", so the docstring would’ve helped there.
Also I’d expect the supremum of {} to be -Inf, but not the maximum. But that’s just from what I remember from a couple of years ago :wink:


#10

My 2¢: Just like parse and tryparse are two versions of parsing which throw and not-throw an exception respectively, what we want here is two versions of maximum (resp. minimum).

Luckily, even mathematicians understand the difference and have two definitions for these: maximum and supremum. The maximum would throw on an empty collection and always return an element from the collection and supremum would always return a least upper bound or -Inf (or typemin).

infimum is to minimum like supremum to maximum.

Also we now have DNF’s asserions: maximum(itr) >= minimum(itr) always, and maximum(itr) ∈ collect(itr). Additionally, we have: Post maximum/minimum the collection is known to be non-empty (unless exception thrown), infimum/supremum never throw, and for each x ∈ collect(itr) it is true that x <= supremum(itr).


#11

That is cutting the Gordian knot! I did not dare to propose the introduction of two new general functions into Base!
But I would prefer to use infimum and supremum in place of minimum and maximum.
Should we also ask for lowerbound and upperbound for min and max for the case of partially orders types; or just re-use the new function names?


#12

Is an array even a set? Not as far as I can find: “A set is a well-defined collection of distinct objects.” Is that a controversial definition?

infimum and supremum are fine, but should not replace maximum and minimum


#13

But, anyway: is it more reasonable for maximum to throw, or to return a Nullable? Is this the sort of thing Nullables are for?


#14

No, any Array is not a Set, but both are iterables. The argument of maximum is an iterable itr.
One of the examples uses eltype(itr) == Set{T}, which is another topic.

infimum and supremum should not replace minimum and maximum, but they should extend the definition area (allowed argument values); actually only the empty iterator is added. As it seems to be difficult to extend the definition area of maximum or minimum, because of unexpected results in the extension area, the proposal to use the new names, which are mathematically better justified.


#15

Returning a Nullable is not a valid option for me. I would rather use supremum and work with the result (potentially -Inf) instead of having to investigate the state of the nullable object each time.
Imagine max(supremum(A), supremum(B)) for example, and compare with
a1 = maximum(A); a2 = isnull(a1) ? get(a1) : -Inf; b1 = maximum(B); b2 = isnull(a1) ? get(b1) : -Inf; max(a2, b2)


#16

I’m not used to working with Nullables, but I think

max(get(maximum(A), -Inf), get(maximum(B), -Inf))

might have worked.


#17

You could also just define supremum(x) = isempty(x) ? typemin(eltype(x)) : maximum(x) or similar without necessitating any changes to Base.


#18

This was in the hypothetical scenario where maximum returned a Nullable. And I was wondering/asking if that would be the ‘correct’ way for maximum to work.


#19

You are right, that would be a less ugly solution - I did not think of the default-value argument of get(::Nullable).


#20

Yes, I could do that; and indeed that is what I did, after I recognized the behaviour of maximum applied to empty iterables. I opened this topic, because I thought it worthwhile to improve the language in a way, which makes such small work-arounds unnecessary.

What inspired me to propose the modification of two the Base.mr_empty methods was the kind of implementation of maximum. It calls mapreduce(identity, Base.scalarmax, itr), which in turn calls mr_empty(identity, scalarmax, eltype(itr)).
So the mechanism is already in place in Base providing return values for empty iterators.

Actually, if the first argument of mapreduce is abs or abs2, the default value is zero of the appropriate type.
There is another method maximum(f::Function, itr), which is implemented in Baseas well, which calls mapreduce(f, max, itr).
So maximum(abs, itr) does exactly what I expect (it delivers the greatest lower bound of all numbers >= 0), when itr is empty.
At the same time maximum(abs.(itr)), which is equivalent to the former, if itr is not empty, throws an exception. So my approach to extend the domain of maximum to empty iterators (actually only one value is added to the domain) seemed to be natural to me. It would not break existing code, which does not rely on the exception to be thrown.
(And I cannot image a programmer who relies on the exception in order to figure out, if his iterable is empty.)