# Sorting Policy

What we have now:

Right now Base/Sort.jl has a sorting policy implementation that is somewhat counterintuitive to me.

I see three levels of function call,

1. Keyword: sort([2,3,7], by=x->floor(sqrt(x)), rev=true) with 10 methods:
sort, sort!, partialsort, partialsort!, sortperm, sortperm!, partialsortperm, partialsortperm!, sort on a Matrix/Array, and sort! on a Matrix/Array.
2. 3-argument: sort!([2,3,7], QuickSort, By(x->floor(sqrt(x)), Reverse)) with 5 methods:
sort!, fpsort!, sort_int_range!, partial_sort!, and sortperm_int_range
3. 5-argument: sort!([2,3,7], 1, 3, QuickSort, By(x->floor(sqrt(x)), Reverse)) with 1 function dispatched on 4 algorithm types:
InsertionSort, QuickSort, PartialQuickSort, and `Merge

Which of these levels is part of the interface? Iâ€™d like it to be just the first, and only the first is documented, but sort! is exported, and both the 3 and 5 argument versions are therefore also exported. Sort`.

sort, partialsort, & partialsortperm internally copymutable and call their corresponding mutating version at the keyword level.

sortperm!, partialsortperm!, & sort! on Matrix/Array internally call the 3 argument method of sort!.

Similarly, partialsort! calls the 3 argument method of partialsort!

This leaves sort!, sortperm, and sort on Matrix/Array with non-passthrough behavior: sort! dynamically chooses between the three argument sort! and sort_int_range!. Similarly sortperm dynamically chooses between the three argument sort! and sortperm_int_range. Finally, sort on Matrix/Array jumps via its sort_chunks! to the 5 argument sort!.

At the three argument level, sort! specializes to sometimes statically dispatch to fpsort!. All of sort!, fpsort!, and partialsort! then call the 5 argument method of sort!.

This structure conflates the presence of lo and hi with policy decisions of whether to perform a floating point sortâ€¦

What I propose:

I propose to eliminate the 5-argument methods. If we want to talk about the 5th through 63rd element of a 700 element long v::AbstractVector, we should use view(v, 5:63)::AbstractVector, not v::AbstractVector, 5::Integer, 63::Integer.

The 5-argument/3-argument distinction serves a second (counterintuitive) purpose, of facilitating dispatch to fpsort!. I propose to replace this with a new Algorithm:

struct DetectSpecialCases <: Algorithm
default::Algorithm
end

and set const DEFAULT_STABLE = DetectSpecialCases(MergeSort) and const DEFAULT_UNSTABLE = DetectSpecialCases(InsertionSort).

We then move (almost*) all policy decisions into the method sort!(v::AbstractVector, a::DetectSpecialCases, o::Ordering) by defining

sort!(v::AbstractVector{Float}, a::DetectSpecialCases, o::DirectOrdering) = fpsort(v,a,o)
function sort!(v::AbstractVector, a::DetectSpecialCases, o::Ordering)
if should_int_range(v, o)
sort_int_range!(v, o)
else
sort!(v, a.default, o)
end
end

The keyword partialsort! method would directly call the 3 argument sort! method.

For backward compatibility we could define deprecated sort!(v::AbstractVector, lo::Integer, hi::Integer, a::Algorithm, o::Ordering) = sort!(view(v, lo:hi), a, o)).

I think this explicit handling of special case detection is more clear and more easily usable by folks who have domain knowledge but donâ€™t want to use un-documented or un-exported functionality, Base developers, and even folks are who want to use internals. If would be easier, for example, to correctly insert a hook for dispatching to RadixSort. Hopefully searching for DetectSpecialCases will be an effective way to understand baseâ€™s internal sorting policy decisions.

Some objective impacts:
sort(rand(1:10, 100), alg=MergeSort) would use merge sort rather than counting sort
partialsort(rand(100), 1:4) would utilize the fpsort! optimizations.
sort(rand(10,10), dims=1) would utilize the fpsort! optimizations.

*the one remaining policy decision is the dispatch of sortperm (not sortperm!) on Vectors (not AbstractVectors), of eltype <: Integer, sorted in Forward order, with a range strictly smaller than the typeâ€™s range (not sortperm(rand(UInt8, 10_000))), and smaller than half the length of the collection (neither sortperm(rand(Int16, 10_000)) nor sortperm(rand(1:100, 150))).

3 Likes

Thanks! Iâ€™ll look into it!

For the record, Juliaâ€™s default algorithms are already as fast/faster than the performance figures they claim.

They say â€śFor one million 32/64/128-bit numbers, our code running on Apple M1 can produce sorted output at rates of 499/471/466 MB/sâ€ť

On Juila 1.9.0-DEV.439 on my 2019 intel i5 Mac, we have

julia> sizeof(rand(Int, 10^6)) / (@belapsed sort!(x) (setup=(x=rand(Int, 10^6))))
3.0632643347081906e8

for 306 MB/s

306 is a bit lower than 471, but this is more than explained by CPU difference: â€śIn some tests, the M1 shows a slight improvement, but in most tests, the M1 blows the Intel version away.â€ť (source)

good!

On the M1 mac I get

julia> sizeof(rand(Int, 10^6)) / (@belapsed sort!(x) (setup=(x=rand(Int, 10^6))))
6.680724023466043e

so considerably faster

1 Like

Huzzah! Thanks for the benchmark.

Their technique seems related to whatâ€™s going on in Stabilize, optimize, and increase robustness of QuickSort by LilithHafner Â· Pull Request #45222 Â· JuliaLang/julia Â· GitHub. One difference is that Iâ€™m not particularly interested in dealing with @simd and friends right now, so Iâ€™m writing ordinary for loops and waiting for the wonderful folks in compiler optimizations to do it for me.

I initially thought weâ€™re faster than Google because we donâ€™t use quick sort, but then

julia> Base.Sort.defalg(rand(Int, 10))
Base.Sort.QuickSortAlg()

so Iâ€™m confused, what are we doing thatâ€™s so good?

Youâ€™re probably running a Julia version before v1.9.0-DEV. Weâ€™re faster on 1.9 because we use radix sort.

thatâ€™s what I thought. Which makes the â€śJulia is fasterâ€ť pointless because itâ€™s not quick sortâ€¦

Yes and no. You could instead argue that manual SIMD optimization of quicksort is the side thatâ€™s pointless since for the places you would use it, you could just use radix sort instead.

1 Like