Taking sorting & ordering seriously

stabbles · September 14, 2018, 7:26pm

Hi all,

I’m happy to share this proof of concept about refactoring the sorting and ordering API in Julia:

SortingSortingOut.jl

The basic ideas are:

To make Ordering objects more composable and reusable
To figure out a better way to dispatch on efficient sorting algorithms.

The following is one of the motivating examples where sort! does not dispatch on an efficient floating point sorting algorithm, because the signature is sort!(::Vector{Product}, ...) rather than sort!(::Vector{Float64}). In my package I try to dispatch on the inferred type that is being compared / sorted, which allows me to use the efficient floating point sorting algorithm .

Results

n	`sort!`	`sortsort!`	x faster
10_000	1.113 ms	560.6 μs	2.0x
1_000	66.00 μs	11.66 μs	5.7x
100	2.021 μs	615.6 ns	3.3x

Benchmark

using SortingSortingOut, BenchmarkTools

struct Product
    price::Int
    weight::Float64
end

weight(p::Product) = p.weight

function sort_products(n = 100)
    products = [Product(rand(1:100), 100rand()) for i = 1 : n]

    fst = @benchmark sort!(ps, by = $weight) setup = (ps = copy($products))
    snd = @benchmark sortsort!(ps, $(By(weight))) setup = (ps = copy($products))

    fst, snd
end

I’m hoping people could give feedback on this idea

StefanKarpinski · September 14, 2018, 7:41pm

Strongly approve. This stuff could really use some love. Improving the performance of sortperm would also be a great thing.

xiaodai · September 14, 2018, 10:21pm

Also happy to contribute my string sort algorithms WIP: faster string sort - #74 by xiaodai

tbeason · September 14, 2018, 10:25pm

Well, since you asked for feedback… I was really hoping for something more than 80% faster.

DNF · September 15, 2018, 6:03am

So, is it 80% faster, or 5x as fast?

JeffreySarnoff · September 15, 2018, 7:25am

I wrote this a while back: https://github.com/JeffreySarnoff/SortingNetworks.jl
(use master for faster tuple handling)

andyferris · September 15, 2018, 11:46am

Very nice I also agree that we could clean up this area a bit.

One thing I would to see is some way of indicating that a data source is already sorted appropriately, so that findall, findfirst and so-on will just use searchsorted, searchsortedfirst, etc. I think I’ve seen this rough idea mentioned before, somewhere. (This also came up in one of my side projects more recently, the SortIndex in AcceleratedArrays.jl does exactly this).

nalimilan · September 15, 2018, 12:34pm

andyferris · September 15, 2018, 1:28pm

That was probably it

stabbles · September 15, 2018, 5:07pm

Thanks all!

One of the things that makes sortperm slower is that it guarantees stability (indices of repeated values appear in ascending order). If that requirement is dropped, the implementation would just be

sortperm(xs, ord) = sortsort!(Vector(OneTo(length(xs))), By(i -> @inbounds(xs[i]), ord))

and it could automatically dispatch to specialized sorting algorithms in my proof of concept code .

That looks really nice! After reading the discussion on Github, I’m sure you have some ideas about what sort! should really dispatch on.

This took me a bit too long . Fortunately there are the absolute numbers as well ^^.

So should this eventually dispatch on sort(::NTuple)?

StefanKarpinski · September 15, 2018, 5:11pm

I suspect that stability could be patched up after the fact for sortperm: do an unstable sort and then just scan through the indices and sort each equivalence class of indices (equivalent in the sense that v[i] == v[j]). Of course, in the worst case, that’s sorting the entire range of indices, but maybe in practice it would be ok. It seems like it should be possible to take advantage of the fact that we know the exact range of index values… seems like the perfect use case for a radix sort.

foobar_lv2 · September 15, 2018, 5:48pm

Re sortperm performance / stability:

One could do it the same way as for normal sorts, i.e. it is only stable when the user requests it via the algorithm keyword.

For small bitstypes (or small results of By) and medium-sized arrays, it might make sense to sort! a temporary of (idx::Int, compareT::T) tuples, for better cache locality. I think this is what kills current sortperm performance.

JeffreySarnoff · September 15, 2018, 6:52pm

SortingNetworks master (pending merge into METADATA) already does dispatch on either N args or NTuples where N is in 1:16. And as to sort yes, only for small N (1…16, which are provably optimal as exchange networks, already ready).

foobar_lv2 · September 15, 2018, 10:15pm

So after giving it a try, I am very underwhelmed by the cache-local variant. Or, more precisely, I am absolutely shocked at how fast sortperm is, even though it should basically have no spatial locality at all.

julia> function _sortperm(A)
       tmp = [(@inbounds A[i],i) for i=1:length(A)]
       sort!(tmp)
       [t[2] for t in tmp]
       end
julia> v=rand(Int,10^8); sort(v[1:1000]); sortperm(v[1:1000]); _sortperm(v[1:1000]); 
julia> begin
       @time sort(v)
       @time sortperm(v)
       @time _sortperm(v);
       end;
 13.288238 seconds (6 allocations: 762.940 MiB, 0.70% gc time)
 42.778155 seconds (7 allocations: 762.940 MiB, 0.15% gc time)
 23.058652 seconds (13 allocations: 2.980 GiB, 0.47% gc time)

StefanKarpinski · September 15, 2018, 10:24pm

Sort is stable by default, which is the opposite.

foobar_lv2 · September 15, 2018, 11:51pm

Thanks! Then some zip-sort-extract variant is probably worthwhile for small bitstypes (as long as the temporary fits into memory), seeing a 3x speedup:

julia> function _sortperm(A)
              tmp = [(@inbounds A[i],i) for i=1:length(A)]
              sort!(tmp; alg=QuickSort)
              [t[2] for t in tmp]
              end
_sortperm (generic function with 1 method)

julia> v=rand(Int,10^8); sort(v[1:1000]); sortperm(v[1:1000]); _sortperm(v[1:1000]);

julia> begin
       @time sort(v)
       @time sortperm(v)
       @time _sortperm(v);
       end;
 13.137279 seconds (6 allocations: 762.940 MiB, 0.32% gc time)
 42.627918 seconds (7 allocations: 762.940 MiB, 0.00% gc time)
 16.295594 seconds (11 allocations: 2.235 GiB, 1.01% gc time)

PS. ~100KB - 1MB of buffer to sort appears to be an OK cross-over for the copy to amortize.

Topic		Replies	Views
[ANN] SortingLab.jl - fast sorting algorithms for strings and CategoricalArrays Package Announcements sort , sortperm	2	1112	January 15, 2019
10x faster sortperm() Performance sortperm	10	2706	October 29, 2024
Fast permutation vector: code needed Performance sorting	38	643	March 21, 2023
Which sorting algorithm should I use? Performance sort , sortperm	2	735	October 31, 2018
Ironic observation about `sort` and `sortperm` speed for "small integers" vs R Performance sort , sortperm , r	32	4669	February 4, 2018

Taking sorting & ordering seriously

SortingSortingOut.jl

Results

Benchmark

Related topics