Understanding issorted's lt keyword

The documentation for Base.isless does mention “total order” (Understanding issorted's lt keyword - #2 by goretkin) . And isless is the default argument. The connection isn’t as clear as could be, but it’s there.

The default argument creates(?) a total ordering, but it doesn’t follow that all arguments must do so.

Oh, well, it’s getting late. Maybe sleeping on it will help.

1 Like

I don’t see a ranking of fundamentalness between < and <=. It’s possible to define either in terms of the other. A total order is normally defined in terms of <=.

Checking the strict-sortedness of a collection that allows duplicates seems like a type error. isstrictlysorted(x::OrderedSet) is fine but isstrictlysorted(x::Vector) doesn’t make much sense, or at least is error prone.

1 Like

My contention is that issorted already contains that loop. What I’m not sure about is if I would be I’m exploiting documented behavior or an implementation detail by using issorted( ..., lt = <=).

Yes, both choices are possible. And a choice has already been made. This allows Base to have fallback methods.

Can you explain what you don’t see here in the previous post? You can also just look at the docstrings for isless, <, <=, etc.

To be precise, my claim was about isless and <=, not < and <=. And to be less precise, I did use scare quotes around fundamental. I would not say one of these concepts are inherently more fundamental than the other, but a choice has been made to define (by default) comparisons in terms of isless, and not e.g. to define isless in terms of <=.

A user is able to make another choice for their own type. They could define <= first, and then define isless in terms of <= and ==. Their method definition for isless will look generic, but because the user has gone against the design intention of the generic functions, these methods must be limited to their own type, otherwise there will be method ambiguities.

I am really struggling to understand the confusion here. Some people are wishing that issorted was defined in terms of something like <=. Okay, but it’s not. That it is defined in terms of something like isless is reasonable.

3 Likes

I don’t think running isstrictlysorted(::ArrayWithDuplicates) is usually the user’s intent when they run issorted(lst), and having footguns lying around is a bad property in a language.

I’m arguing we should change it. Several ideas arise.

  1. Rename issorted to isstrictlysorted and introduce isweaklysorted which behaves in the expected way
  2. Error on collections that contain duplicates.
  3. Error on duplicate-allowing collection types. E.g. error on Vector, allow for OrderedSet.

I prefer not to error (2,3) because it reduces genericness. Having explicit function names (1) is better.

(Sorry about all the edits.)

First of all, what does this mean? Secondly, just because the answer to something is false, doesn’t mean it’s unreasonable to ask the question.

Please show an actual piece of code that you think is a footgun.

I am going to guess that it’s this:

julia> issorted([1, 1], lt = <=)
false

And the footgun is that a user expects that it returns true? The user will have to have read the docstring for issorted to even know the keyword argument is called lt:

Test whether a vector is in sorted order. The lt, by and rev keywords modify what order is considered to be sorted just as they do for sort.

So, after reading that, the user passes in lt = <=. What is the logical or emotional justification for a user doing that?

Yeah I deleted that line :slight_smile:

This edit still doesn’t help me understand what you meant.

Current behavior:

julia> issorted([1, 1])
true

it’s not a test of whether the list (which, by the way, contains duplicates) is “strictly sorted”.

Sorry for the confusion, I wrote inaccurately above. The unexpected behavior is

julia> issorted([1,1], lt=<=)
false

and more generally that issorted(sort(seq, lt=f), lt=f) isn’t necessarily true for collections with duplicates.

Maybe a solution is to provide an le= parameter.

It is true if you pass in appropriate choices for f. The solution is to document appropriate choices directly. Right now it’s quite indirectly documented in sort!:

the lt keyword allows providing a custom “less than” function

So the footgun, aside from the indirect documentation, is that <= seems like a reasonable choice for a “less than” function to you. So a solution would be to add e.g.:

Note, <= is not a “less than” function.

1 Like

Yeah, basically.

In other words, the issue is that it’s asking for a function that has certain properties when that isn’t what I expect it to be asking for, and it gives undesirable results when I give it something that I naively think is similar.

Putting a warning message on it may help sometimes, but designing the API so that it doesn’t need a warning message since it already conforms to expectations may be more successful.

I think it’s obvious that it is not possible to design an API that conforms to everyone’s expectations, though it is a worthy ideal to keep in mind. But there are other important ideals, such as consistency, and enabling generic programming. Julia Base has broken the “tie” between “define <= in terms of < or vice versa?”. The definition of issorted follows as a consequence, consistently.

You do not appear to be considering all of the other implications of changing the chosen design. Since you are not the OP, it also seems that you have not run into this problem yourself.

If you haven’t already, please read PSA: Julia is not at that stage of development anymore .

2 Likes

I guess that’s true.

Could you spell out why this follows?

I’ve just dug out all the history of the issorted function (which is surprisingly hard) and came to the conclusion that it’s fine. What transpired is, that at no point in this whole thing, using <= as an ordering inducing function was considered to be meaningful.

When the (precursor) of the current implementation lt(this, prev) (not literally) was introduced, sorting orders were actually determined by a pair of lt, le functions, but the second part was dropped, probably because it is somewhat redundant when one sticks to the requirements of “less than” for inducing order. It’s clear that there is no “less equallier” function belonging to <= like <= belongs to <.

Fast forward: how order is defined is actually documented by isless, much more clearly than what sort has to say about the lt keyword. The critical part is that the lt keyword gets tranformed to a Base.Lt, which is actually pretty clear about what you can meaningfully do with it:

help?> Base.Lt
  Lt(lt)

  Ordering which calls lt(a, b) to compare elements. lt should obey the same rules as implementations of isless.

So we should probably just make issorted and sort point to Base.Lt (or isless directly) for the explanation of the lt keyword and consider it done.

The fact that using lt = <= actually yields an equivalent of issorted && allunique is a peculiar implementation detail but not more then. (Answering the initial question) Exploiting this is probably still relying on an implementation detail but one that is unlikely to change anytime soon.

5 Likes

To get the desired behaviour:

using IterTools.partition
isinorder(x; order=<=) = all(t->order(t...), partition(x, 2, 1))

isinorder([1, 1, 2]) == true
isinorder([2, 1, 1]) == false
isinorder([1, 1, 2]; order=<) == false
isinorder ([1, 1, 2]; order=x->true) == true

This is in essence what I wrote 80 posts ago: Understanding issorted's lt keyword - #7 by goretkin :upside_down_face:

3 Likes

I’m fine with this being better documented, and not changed.

But I must say I am a bit baffled that it’s apparently so hard to see why the current behaviour is surprising. Understanding the current behaviour requires an extremely high level of mathematical sophistication. If you are capable of that then it makes sense (I presume, I’m only a Msc in mathematics, I don’t think this is within my intellectual reach), and I don’t mind that Julia normally chooses ‘correctness’ over intuition.

But after this many posts, it’s really strange that there is apparently zero understanding among some, such as @goretkin, that this

jl> issorted([1,2,2,3]; lt=<=)
false

jl> issorted([1,2,2,3]; lt=<)
true

is surprising and weird for people below the PhD level. It may be correct, but it is strange.

What all of this has to do with which is more ‘fundamental’ of isless or < is, unfortunately, also beyond me. It’s fine that isless is the default, because it’s more fundamental, but that alone doesn’t explain why <= cannot also be used.

1 Like

I would say it is an implementation detail that issorted(A, lt=<=) behaves as issorted(A) && allunique(A). It is definitely not documented behavior. It also doesn’t work if NaNs are involved:

julia> issorted([2, NaN, 1], lt=<=)
true

(NaN is not ordered with < and <=, which means that x < NaN, x <= NaN, NaN < x and NaN <= x return false for every x. That’s why lt=isless is the default, not lt=<.)

You can certainly use it (if you know that you don’t have to deal with NaN or have to distinguish between -0.0 and 0.0), but it might be more clear to write your own function for this. You could just copy the implementation of issorted and adapt it to your case.

What about adding something like the following to the sort! docstring:

If a custom lt function is supplied, it should behave like a “less than” function (as opposed to a “less than or equal” function), i.e., for every x and y, only one of lt(x,y) and lt(y,x) can return true.

2 Likes