Why does Base only have Merge Insertion and Quick sorting algorithms?

base/sort.jl has the algorithms InsertionSort, MergeSort, and QuickSort, yet I hear folks talk about counting, radix, and TimSort builtin to julia (e.g. https://julia-ylwu.readthedocs.io/en/latest/stdlib/sort.html for TimSort). I couldn’t find any other algorithms from methods(sort!) or methods(sort) within Julia’s standard library, but might be missing something.

Is there some reason Julia’s Base.sort! doesn’t use TimSort, dual-pivot quick sort, quadsort, or some-such as a default sorting algorithm with multiple dispatch to radix, counting, etc. for special cases?

I imagine that it takes quite some thought and discussion to pick a language/stdlib’s default sorting algorithm(s), but haven’t been able to find that discussion for Julia. Is anyone aware of where that discussion is?

2 Likes

The short answer is we haven’t done it yet because it’s a lot of effort and no one has bothered yet. I think the correct default is probably something along the lines of radix sort where applicable, and timsort otherwise. One thing that might be really interesting is combing radix sort with insertion sort/timsort for things like BigFloat. One round of MSD radix sort will often do a good job of producing nearly sorted sequences which insertion sort will do a very good job on.

2 Likes

See also https://github.com/JuliaCollections/SortingAlgorithms.jl

4 Likes

TimSort was implemented in Base at one point and it wasn’t actually the big win that it is in Python and Java. This may be because Julia tends to organize arrays quite differently from those languages, which both use pointer-heavy data structures whereas Julia tends to use inline (immutable) data structures more. That implementation was moved into SortingAlgorithms, where it still lives. It’s quite possible that the performance characteristics are different now since Julia’s compiler has evolved quite a bit as has the hardware that people use it on.

It would be a worthwhile project for someone to figure out what the best set of algorithms for Base would be. I agree that radix sort seems like it would be a win for floats and integers. TimSort might be better for some kinds of data structures or data patterns but not others. I’m not sure what the benefit of the dual pivot quick sort algorithm is supposed to be — I vaguely recall trying it and not seeing an improvement. I don’t think anyone has tried quadsort.

14 Likes

The short answer is we haven’t done it yet because it’s a lot of effort and no one has bothered yet.

Does that mean a PR with some more complicated and modern algorithms for base/sort.jl would be welcome?

One thing that might be really interesting is combing radix sort with insertion sort/timsort for things like BigFloat.

I love how beautifully multiple dispatch supports that special case.

2 Likes

Maybe do a performance study so that a decision can be informed but yeah.

2 Likes

My thought was to use timsort for non-isbits types, where it theoretically should do well.

2 Likes

Maybe do a performance study so that a decision can be informed

This seems like a necessary step. Does anyone know of real world sorting benchmark data or should I stick with the random, sawtooth, pip-organ, etc. from the literature?

2 Likes

See

SortingLab.jl

This Goes To 11

4 Likes

Also SortingAlgorithms.jl

It would be great to have more sorting algorithms implemented, but ideally, experimentation with sorting algorithms should happen in packages, possibly those mentioned in this topic.

Putting things in Base ties new releases to Julia release cycles and makes breaking changes to the API practically impossible. The language includes commonly used workhorses, but new sorting algorithms should only be added to Base if they are somehow a major improvement for generic sorting problems (unlikely, but who knows), or of they are needed for something that Base or the standard libraries require.

3 Likes

We can change out sorting algorithm without affecting the API. The main thing to get out of this isn’t allowing user customization of sorting algorithm, but just picking better defaults.

3 Likes

and also:

simd-sorting/fast-and-robust

ips4o

1 Like

and also:

simd-sorting/fast-and-robust

ips4o

Thank you for these links! They look like impressive alternatives for where Julia currently uses QuickSort.

I’m planning to start with a stable sorting algorithm to replace MergeSort. QuadSort (which claims to be faster than TimSort for pretty much all inputs) is the top contender I’ve found so far.

I am trying to call boost::sort::block_indirect_sort using boost_jll :

That algorithm is very fast and multi-threaded.

https://github.com/BonzaiThePenguin/WikiSort

[and GrailSort linked from there] seem interesting. https://raw.githubusercontent.com/BonzaiThePenguin/WikiSort/master/tamc2008.pdf

A.
The fastest stable sort (published last month, based on his quadsort that’s “faster than quicksort”) seems to be: GitHub - scandum/blitsort: Blitsort is an in-place stable adaptive rotate mergesort / quicksort.

Blitsort uses a monobound binary search, which is up to two times faster than the binary search in general use.

Trinity rotation

Blitsort uses a trinity rotation, a new and significantly faster array rotation algorithm.
[…]

By default blitsort uses 512 elements worth of stack memory. […] Blitsort rotate merges recursively, requiring an additional log(n) memory. It’s possible to make this O(1) through the implementation of a stack.

Performance

Blitsort has exceptional performance due to the quad swap, monobound binary search, and trinity rotation. It is likely the fastest in-place stable sort written so far and is about 15% faster than octosort, which is a block merge sort.

Blitsort’s performance is similar to that of quadsort as long as

His quad sort and all the variants are also very well documented, and also WikiSort:

B.
WikiSort vs. std::stable_sort()
(clang++ version 3.2, sorting 0 to 1.5 million items)

Up do 1280% faster, always faster for the many cases in the two tables there, except for one “95% as fast”