Understanding issorted's lt keyword

The lt keyword for issorted and sort! is defying my intuition. From reading the source code, I understand why I get the following results, but I am finding it counterintuitive.

From the documentation for sort!:

the lt keyword allows providing a custom “less than” function

Let’s try it:

julia> issorted([1, 2, 3, 3])
true

julia> issorted([1, 2, 3, 3], lt = <)
true

julia> issorted([1, 2, 3, 3], lt = <=)
false

julia> issorted( sort([1, 2, 3, 4, 3], lt = <=), lt = <=)
false

I would think confirm_sort(A, B) = issorted( sort( A, lt = B), lt = B) would always evaluate to be true.

Can anyone explain how to think of the lt keyword intuitively?

3 Likes
help?> issorted
search: issorted InsertionSort

  issorted(v, lt=isless, by=identity, rev:Bool=false, order::Ordering=Forward)

  Test whether a vector is in sorted order. The lt, by and rev keywords modify what order is considered to be sorted just as they do for sort.

help?> sort
[...] 

  sort(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)

The default lt function is isless.

help?> isless
[...]

  isless(x, y)

  Test whether x is less than y, according to a fixed total order. isless is not defined on all pairs of values (x, y). However, if it is defined, it is expected to satisfy the
  following:

    •  If isless(x, y) is defined, then so is isless(y, x) and isequal(x, y), and exactly one of those three yields true.

Note that lt = <= does not satisfy that property, since for x == y, all three (lt(x, y), lt(y, x), isequal(x, y)) are true.

˜Perhaps more intuitively, ask yourself how you might implement issorted, and what you’d expect it to be able to do just given the <= function?~ [ Edit: I’m not sure what I had meant by this]

2 Likes

I don’t see how that explains

julia> issorted([1,1], lt=<=)
false
1 Like

Part of the reason I am interested in lt = <= is that I actually want to verify that an iterable is both sorted and unique. I want the sequence to be strictly monotonically increasing, not just sorted.

My intuition would have been to indicate lt = < to accomplish that, but that is wrong.

You are probably guessing at an implementation for issorted that is incorrect. It’s not that lt(x, y) must be satisfied for consecutive x and y. Otherwise issorted([1, 1], lt=isless) would be false.

What is it then?

I think I see the pattern you are following, but ultimately issorted is likely to be the wrong concept, since you’re not asking if a list is sorted, but you’re asking if it’s strictly increasing. I think you should think of issorted as a verifier for sort. There’s nothing that sort can do to make some data strictly increasing and still look like a sort (it would have to e.g. discard some data).

Consider something instead like:

julia> using IterTools: partition

julia> all_consecutive_pairs(pred, itr) = all(Base.splat(pred), partition(itr, 2, 1))
all_consecutive_pairs (generic function with 1 method)

julia> all_consecutive_pairs(<, [1, 2, 3])
true

julia> all_consecutive_pairs(<, [1, 3, 3])
false
2 Likes

This still doesn’t make sense to me.

julia> issorted([1,1], lt=<=)
false

Here is the source code for issorted:

https://github.com/JuliaLang/julia/blob/bb5b98e72a151c41471d8cc14cacb495d647fb7f/base/sort.jl#L56-L68

https://github.com/JuliaLang/julia/blob/bb5b98e72a151c41471d8cc14cacb495d647fb7f/base/sort.jl#L91-L93

Basically it goes over all the numbers. If the current number is “less than” the previous number, then it reports that the iterable is not sorted.

In the issorted([1,1], lt=<=) case, it instead asks if the current number is “less than or equal to” the previous number. If true, then it indicates the array is not sorted. The evaluation here boils down to

if 1 <= 1
    return false
end
1 Like

Thanks. Tbh that behavior still seems incongruous with the docs which say “Test whether a vector is in sorted order.”

1 Like

Ultimately, I think @goretkin is right that the implementation is key. Currently, the default is

if this < prev
    return false
end

The alternative would be as follows.

if !( this >= prev )
    return false
end

The second version seems less efficient since you have to do the >= comparison and then negate it. However, this is Julia, and we have a fancy compiler.

julia> function f(this, prev)
           if this < prev
               return false
           end
           return true
       end
f (generic function with 1 method)

julia> function g(this, prev)
           if !( this >= prev )
               return false
           end
           return true
       end
g (generic function with 1 method)

julia> f(1,2)
false

julia> g(1,2)
false

julia> f(2, 1)
true

julia> g(2,1)
true

julia> @code_llvm debuginfo=:none f(2, 1)
; Function Attrs: uwtable
define i8 @julia_f_816(i64 signext %0, i64 signext %1) #0 {
top:
  %.not = icmp sge i64 %0, %1
  %spec.select = zext i1 %.not to i8
  ret i8 %spec.select
}

julia> @code_llvm debuginfo=:none g(2, 1)
; Function Attrs: uwtable
define i8 @julia_g_818(i64 signext %0, i64 signext %1) #0 {
top:
  %.not = icmp sle i64 %1, %0
  %spec.select = zext i1 %.not to i8
  ret i8 %spec.select
}

Both versions compile to very similar code.

julia> @code_native debuginfo=:none issorted([1, 2, 3])
        .text
        pushq   %rbp
        movq    %rsp, %rbp
        movq    8(%rcx), %r9
        movb    $1, %al
        testq   %r9, %r9
        je      L57
        cmpq    $1, %r9
        je      L57
        movq    (%rcx), %r8
        movq    (%r8), %rdx
        movl    $2, %ecx
L32:
        movq    %rdx, %r10
        movq    -8(%r8,%rcx,8), %rdx
        cmpq    %r10, %rdx
        jl      L55
        cmpq    %r9, %rcx
        jae     L57
        incq    %rcx
        jmp     L32
L55:
        xorl    %eax, %eax
L57:
        popq    %rbp
        retq
        nopl    (%rax,%rax)
julia> import Base: issorted, lt

julia> function issorted(itr, order::Base.Order.Ordering)
           y = iterate(itr)
           y === nothing && return true
           prev, state = y
           y = iterate(itr, state)
           while y !== nothing
               this, state = y
               !lt(order, prev, this) && return false
               prev = this
               y = iterate(itr, state)
           end
           return true
       end
issorted (generic function with 5 methods)

julia> @code_native debuginfo=:none issorted([1, 2, 3], lt = <= )
        .text
        pushq   %rbp
        movq    %rsp, %rbp
        movq    8(%rcx), %r9
        movb    $1, %al
        testq   %r9, %r9
        je      L57
        cmpq    $1, %r9
        je      L57
        movq    (%rcx), %r8
        movq    (%r8), %rdx
        movl    $2, %ecx
L32:
        movq    %rdx, %r10
        movq    -8(%r8,%rcx,8), %rdx
        cmpq    %rdx, %r10
        jg      L55
        cmpq    %r9, %rcx
        jae     L57
        incq    %rcx
        jmp     L32
L55:
        xorl    %eax, %eax
L57:
        popq    %rbp
        retq
        nopl    (%rax,%rax)

Again, it looks like in practice there is not much difference for the given lt. Where this may make a difference is for an arbitrary lt that the compiler cannot easily negate.

you can’t always(?) trivially “flip it” since lt= can be any general binary function

This is an interesting find. I think the easiest way to handle this might be a new keyword, since it seems the implementation does appear to favor strict comparisons. Regardless of the outcome, I think a statement on how ties are handled belongs in the docstring.

4 Likes

Ultimately, I think @goretkin is right that the implementation is key. Currently, the default is

if this < prev
    return false
end

The alternative would be as follows.

if !( this >= prev )
    return false
end

The second version seems less efficient since you have to do the >= comparison and then negate it. However, this is Julia, and we have a fancy compiler.

julia> function f(this, prev)
           if this < prev
               return false
           end
           return true
       end
f (generic function with 1 method)

julia> function g(this, prev)
           if !( this >= prev )
               return false
           end
           return true
       end
g (generic function with 1 method)

julia> f(1,2)
false

julia> g(1,2)
false

julia> f(2, 1)
true

julia> g(2,1)
true

julia> @code_llvm debuginfo=:none f(2, 1)
; Function Attrs: uwtable
define i8 @julia_f_816(i64 signext %0, i64 signext %1) #0 {
top:
  %.not = icmp sge i64 %0, %1
  %spec.select = zext i1 %.not to i8
  ret i8 %spec.select
}

julia> @code_llvm debuginfo=:none g(2, 1)
; Function Attrs: uwtable
define i8 @julia_g_818(i64 signext %0, i64 signext %1) #0 {
top:
  %.not = icmp sle i64 %1, %0
  %spec.select = zext i1 %.not to i8
  ret i8 %spec.select
}

Both versions compile to very similar code.

julia> @code_native debuginfo=:none issorted([1, 2, 3])
        .text
        pushq   %rbp
        movq    %rsp, %rbp
        movq    8(%rcx), %r9
        movb    $1, %al
        testq   %r9, %r9
        je      L57
        cmpq    $1, %r9
        je      L57
        movq    (%rcx), %r8
        movq    (%r8), %rdx
        movl    $2, %ecx
L32:
        movq    %rdx, %r10
        movq    -8(%r8,%rcx,8), %rdx
        cmpq    %r10, %rdx
        jl      L55
        cmpq    %r9, %rcx
        jae     L57
        incq    %rcx
        jmp     L32
L55:
        xorl    %eax, %eax
L57:
        popq    %rbp
        retq
        nopl    (%rax,%rax)
julia> import Base: issorted, lt

julia> function issorted(itr, order::Base.Order.Ordering)
           y = iterate(itr)
           y === nothing && return true
           prev, state = y
           y = iterate(itr, state)
           while y !== nothing
               this, state = y
               !lt(order, prev, this) && return false
               prev = this
               y = iterate(itr, state)
           end
           return true
       end
issorted (generic function with 5 methods)

julia> @code_native debuginfo=:none issorted([1, 2, 3], lt = <= )
        .text
        pushq   %rbp
        movq    %rsp, %rbp
        movq    8(%rcx), %r9
        movb    $1, %al
        testq   %r9, %r9
        je      L57
        cmpq    $1, %r9
        je      L57
        movq    (%rcx), %r8
        movq    (%r8), %rdx
        movl    $2, %ecx
L32:
        movq    %rdx, %r10
        movq    -8(%r8,%rcx,8), %rdx
        cmpq    %rdx, %r10
        jg      L55
        cmpq    %r9, %rcx
        jae     L57
        incq    %rcx
        jmp     L32
L55:
        xorl    %eax, %eax
L57:
        popq    %rbp
        retq
        nopl    (%rax,%rax)

Again, it looks like in practice there is not much difference for the given lt. Where this may make a difference is for an arbitrary lt that the compiler does not recognize.

BTW, it looks more natural if we use ||.

1 Like

issorted([a,b]) is same as !isless(b,a),
so it checks whether [a, b] is non-decreasing;
OTOH, issorted([a,b], lt=<=) is just isless(a,b) and to check strictly increasing.
The default lt(b,a), not lt(a,b) make the situation a little confusing.

It seems very strange to me that strict comparison, <, does not require “strictly increasing”.

It’s also not great that one must “look at the implementation” to understand what issorted does. The current docs seem to assume that the behavior is intuitively obvious. To me the behavior is profoundly weird.

5 Likes

Maybe issorted() should have another keyword argument strict::Bool?

1 Like

I don’t think so; I think it should just do the obvious thing as documented, not this weird other thing that it currently does.

We should definitely document that lt must define a strict total order. Aside from the documentation issue, it is not clear to me what behavior you would prefer:

  • On one hand, you expect that issorted(sort(A, lt=x), lt=x) returns true for all x (which is the case if x defines a strict total order).
  • On the other hand, you want to have a my_lt (either < or <=) such that issorted(sort(A, lt=my_lt)) returns false when A contains non-unique elements.

So what should sort(A, lt=my_lt) do? Remove non-unique elements? It seems very counterintuitive to me that sort would delete elements from the vector.

4 Likes