max(x::T, y::T) and min must be commutative!

bug

#1

I could not believe it in julia Version 0.6.0 (2017-06-19 13:05 UTC) I got

julia> max(Set([1]), Set([2]))
Set([2])
julia> max(Set([2]), Set([1]))
Set([1])
julia> min(Set([1]), Set([2]))
Set([1])
julia> min(Set([2]), Set([1]))
Set([2])

Now, Set{T} where T does not have a canonical total order but a canonical partial order. Julia is aware of the difference, as the doc of < includes:
Types with a canonical partial order should implement <, and types with a canonical total order should implement isless.

The implementation of of max (for non-numerical types) comes to max(x,y) = ifelse(y<x, x, y). This is perfectly valid for total orders but wrong for partial orders, as in the case of Set.

So max(x, y) = ifelse(isless(y,x), x, y) seems more appropriate to me. It would result in:

julia> max(Set([1]), Set([2]))
ERROR: MethodError: no method matching isless(::Set{Int64}, ::Set{Int64})
Stacktrace:
 [1] max(::Set{Int64}, ::Set{Int64}) at ./REPL[115]:1
 [2] macro expansion at ./REPL.jl:97 [inlined]
 [3] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

according to the imperative that it is better to obtain an error condition than to obtain an unreliable or arbitrary result.

In order to remove the Exception it would be possible to re-define the meaning of max. It should return the lowest upper bound, if it exists. For total orders and numerical types, that is equivalent to “the maximal element”. For non-numeric types implementing ‘<’ instead of isless the implementers should be encouraged to provide max and min to return “least upper bound” respectively “greatest lower bound”.

As far as I can see, in the Base package, only Set and IntSet are affected.
To following lines in base/set.jl would solve the requirement:

max(x::AbstractSet{T}, y::AbstractSet{T}) where T = union(x, y)
min(x::AbstractSet{T}, y::AbstractSet{T}) where T = intersect(x, y)

as can be seen:

julia> min(Set([1]), Set([2]))
Set{Int64}()
julia> min(Set([2]), Set([1]))
Set{Int64}()
julia> max(Set([1]), Set([2]))
Set([2, 1])
julia> max(Set([2]), Set([1]))
Set([2, 1])

#2

Just out of interest, what result should you expect when taking the max of two sets?

The relevant code appears to be

<( l::Set, r::Set) = (length(l) < length(r)) && (l <= r)
<=(l::Set, r::Set) = issubset(l, r)

and since max will return the second argument if < is false, it’s returning the second argument.

Edited to add:

struct Foo
    x::Int 
    y::Char  
end

<(a::Foo, b::Foo) = a.x < b.x
==(a::Foo, b::Foo) = a.x == b.x

a = Foo(1, 'a')
b = Foo(2, 'b')
c = Foo(1, 'c')

a < b     # true
b < a     # false
a == c    # true
max(a, c) # Foo(1, 'c')
max(c, a) # Foo(1, 'a')

#3

Thanks for your interest.
I would expect max(s1,s2) == union(s1,s2) and min(s1,s2) = intersect(s1,s2) for sets s1 and s2.
Actually max and min are not the correct names, if the order is partial, it should be upperbound or supremum resp. lowerbound or infimum instead, which would be the correct terms in order theory.
So my preferred solution would be to let max(s1,s2) throw an exception if s1, s2 are sets or do not define isless by any reason. At the same time I would define two of the proposed functions only for non-numericals, which do not define isless, but <.
The idea of re-using the names max, min was just for reasons of economy and name space pollution of the Base module.
There is a related discussion about “maximum, minimum, mapreduce …”, which goes the same direction.


#4

This would turn max at least into an O(n) operation with no short-circuiting.


#5

That is correct, union(s1,s2) and intersect(s1,s2) on sets are more expensive than s1 < s2 on sets in general. If length(s1) < length(s2) or if both are IntSet, the effort is comparable.
I would never trade performance for usability of results.


#6

What does the code example show? The wit of my original post was max(a,b) != max(b,a), even if a != b. That means max applied to sets is not commutative.


#7

My code was intended to show how one can define equality and not get obviously-commutative results from max (though I would argue that the results do show commutativity). I guess my point is this:

julia> Set([1]) < Set([2])
false

julia> Set([2]) < Set([1])
false

julia> Set([2]) == Set([1])
false

julia> Set([2]) < Set([1,2])
true

so therefore max is not well-defined on Sets, at least as far as providing an intuitive answer. I think this is because we’ve chosen to “overload” < to provide a short-circuiting issubset(), and this gets generalized into what we see with max, which is defined generally as max(x, y) = ifelse(y < x, x, y).

I agree that this definition of max has little obvious value to sets, but I’m not sure we should special-case a method just to throw an error. It’s not obvious to me why one would want to take the max of two sets in any case, and redefining max to be an alias of union is but one way of many to define max for a set.


#8

I do not agree with your arguments.
What I wanted to explain, that the current implementation of max is not ok for types, which implement a partial order (by defining <). The implementation max(x,y) = ifelse(y < x, x, y) makes max commutative only for totally ordered types (which are implementing isless).

'max` should never be non-commutative. But it is currently.

In order to guarantee commutativity, the proposal of max(x,y) = ifelse(isless(y,x), x, y).
If nothing else is changed, that throws an exception (islessis not defined) for the partially ordered types.

The second question is, if it is possible to assign a valid natural meaning to max.
That depends on the individual type. For AbstractSet that seems to me union, because union is the least-upper-bound function in the category of sets with their natural ordering relation. So if we implement, as has been done, < by issubset it is not worse to implement max by union.


#9

I found some comments about it. Since sort uses isless, I would agree with @klacru’s suggestion to do the same with max/maximum.


#10

I’d agree as well. Thanks for the discussion.


#11

Changing the definitions of min and max to use isless seems reasonable, although for non-floating-point types < and isless should be identical, so arguably the real bug here is that they differ for sets. The fact that < and <= are defined for sets is quite old and probably not a great idea – we have and operators that can be used for that. Defining min and max to compute intersection and union has a certain appeal, but I don’t think it’s what people would really expect, and as noted would be inefficient. It seems more appropriate to introduce generic meet and join operations.


Extend the domain of maximum and minimum to include empty collection?
#12

I should also note that making min and max perfectly commutative regardless of how isless is defined is not really practical – we have to assume that isless correctly defines a total order which is compatible with isequal in the sense that !isless(a, b) && !isless(b, a) if and only if isequal(a, b). Given that is true, min and max will be commutative in the sense that isequal(a, b) implies that isequal(min(a, b), min(b, a)) and isequal(max(a, b), max(b, a)). You may not have min(a, b) === min(b, a) however.


#13

Issue opened: https://github.com/JuliaLang/julia/issues/23094.


#14

I want to discus that a little bit. I restrict the consideration to non-numeric types.

That contradicts the documentation of < (see also isless).

<(x, y)

Less-than comparison operator. New numeric types should implement this function for two
arguments of the new type. Because of the behavior of floating-point NaN values, < implements
a partial order. Types with a canonical partial order should implement <, and types with a canonical total order should implement isless.

From the emphasised sentence, I conclude, that Types with a canonical partial order (and not a total order) must not implement isless. When trying to call isless for such types, a no-method-found exception should be thrown. When implementing isless, which is the indicator for a total order, the default implementation of Base.< in operators.jl:194 redirects it to isless. So for total orders, your statement is correct, for non-total orders it is wrong.
Set behaves according to these guidelines.

That may be the case or not (I would have done it the opposite way, probably). But if we want to stick to the documentation, we have to implement it for partial orders.

If we modify max and min in the proposed way, I have no problem with that, as long as nobody expects to use maximum and minimum for collections of Sets.
Let us introduce generic meet and join operators in place of min and max for partial orders! For AbstractSet they should call intersect respectively union. Would we also provide methods of the same names for a variable amount of arguments and for collections of the ordered type? The names for the new functions are still under discussion.

The generic implementation of max and min would achieve the === though.


#15

The name join is already used for other purposes, without a corresponding meet, whereas supremum and infimum, sup and inf, lub and glb seem quite free (or if used in packages, perhaps mean more or less the same as proposed here).


Extend the domain of maximum and minimum to include empty collection?
#16

I agree that the current documentation does indicate that < and <= be used for “very partial” orders – like Set. However, we can change the documentation if that seems like a bad idea. There was a massive overhaul of hashing and equality here:

The intention after that was for == and isequal to be the same with the exception of floating-point values like -0.0 and 0.0 and NaN. If == is going to match isequal it would seem sensible for < and isless to match as well. Perhaps @jeff.bezanson can chime in about whether he still feels that < should be used for partial orders or not.

I’m not sure if you’re saying that it should or that some proposed implementation would actually do so. I think this is considerably harder than one might imagine to do generically. Moreover, any definition of min/max which returns new mutable objects (e.g. unions or intersections) and not their actual argument values would not satisfy this for === so isequal is really the best one can hope for satisfying.


#17

Sure, I didn’t mean to use those names literally – “join” in particular is quite overloaded. infimum and supremum are good names and there are plenty of Unicode operators available including and . Also, there’s no need to define any of these in Base Julia, this seems like good material for packages.


#18

I prepared a working package SupremumInfimum, which implements sup and inf.
Feel free to try it out.

https://github.com/KlausC/SupremumInfimum.jl


#19

@StefanKarpinski: Does that mean the generic implementation of max should be:
max(x, y) = isless(y, x) ? (isimmutable(x) ? x : copy(x)) : isimmutable(y) ? y : copy(y)
?

BTW, the current implementation of max is:
max(x, y) = ifelse( y < x, x, y) which implies m === x or m === y. I was thinking of that one.


#20

It should be max(x, y) = ifelse(isless(x, y), y, x). Any definitions of isless and isequal should have the following properties (in addition to the usual total pre-order properties for isless and equivalence relation properties for isequal):

  1. For all x and y, at most one of isless(x, y) or isless(y, x) is true.
  2. For all x and y, isequal(x, y) is true if and only if !isless(x, y) && !isless(y, x).

As long as isless and isequal satisfy these properties, one can fairly easily see that isequal(max(x, y), max(y, x)) for all x and y.