base/sort.jl has the algorithms InsertionSort, MergeSort, and QuickSort, yet I hear folks talk about counting, radix, and TimSort builtin to julia (e.g. https://julia-ylwu.readthedocs.io/en/latest/stdlib/sort.html for TimSort). I couldn’t find any other algorithms from methods(sort!) or methods(sort) within Julia’s standard library, but might be missing something.
Is there some reason Julia’s Base.sort! doesn’t use TimSort, dual-pivot quick sort, quadsort, or some-such as a default sorting algorithm with multiple dispatch to radix, counting, etc. for special cases?
I imagine that it takes quite some thought and discussion to pick a language/stdlib’s default sorting algorithm(s), but haven’t been able to find that discussion for Julia. Is anyone aware of where that discussion is?
The short answer is we haven’t done it yet because it’s a lot of effort and no one has bothered yet. I think the correct default is probably something along the lines of radix sort where applicable, and timsort otherwise. One thing that might be really interesting is combing radix sort with insertion sort/timsort for things like BigFloat. One round of MSD radix sort will often do a good job of producing nearly sorted sequences which insertion sort will do a very good job on.
TimSort was implemented in Base at one point and it wasn’t actually the big win that it is in Python and Java. This may be because Julia tends to organize arrays quite differently from those languages, which both use pointer-heavy data structures whereas Julia tends to use inline (immutable) data structures more. That implementation was moved into SortingAlgorithms, where it still lives. It’s quite possible that the performance characteristics are different now since Julia’s compiler has evolved quite a bit as has the hardware that people use it on.
It would be a worthwhile project for someone to figure out what the best set of algorithms for Base would be. I agree that radix sort seems like it would be a win for floats and integers. TimSort might be better for some kinds of data structures or data patterns but not others. I’m not sure what the benefit of the dual pivot quick sort algorithm is supposed to be — I vaguely recall trying it and not seeing an improvement. I don’t think anyone has tried quadsort.
Maybe do a performance study so that a decision can be informed
This seems like a necessary step. Does anyone know of real world sorting benchmark data or should I stick with the random, sawtooth, pip-organ, etc. from the literature?
It would be great to have more sorting algorithms implemented, but ideally, experimentation with sorting algorithms should happen in packages, possibly those mentioned in this topic.
Putting things in Base ties new releases to Julia release cycles and makes breaking changes to the API practically impossible. The language includes commonly used workhorses, but new sorting algorithms should only be added to Base if they are somehow a major improvement for generic sorting problems (unlikely, but who knows), or of they are needed for something that Base or the standard libraries require.
We can change out sorting algorithm without affecting the API. The main thing to get out of this isn’t allowing user customization of sorting algorithm, but just picking better defaults.
Thank you for these links! They look like impressive alternatives for where Julia currently uses QuickSort.
I’m planning to start with a stable sorting algorithm to replace MergeSort. QuadSort (which claims to be faster than TimSort for pretty much all inputs) is the top contender I’ve found so far.
Blitsort uses a monobound binary search, which is up to two times faster than the binary search in general use.
Trinity rotation
Blitsort uses a trinity rotation, a new and significantly faster array rotation algorithm.
[…]
By default blitsort uses 512 elements worth of stack memory. […] Blitsort rotate merges recursively, requiring an additional log(n) memory. It’s possible to make this O(1) through the implementation of a stack.
Performance
Blitsort has exceptional performance due to the quad swap, monobound binary search, and trinity rotation. It is likely the fastest in-place stable sort written so far and is about 15% faster than octosort, which is a block merge sort.
Blitsort’s performance is similar to that of quadsort as long as
His quad sort and all the variants are also very well documented, and also WikiSort:
B. WikiSort vs. std::stable_sort()
(clang++ version 3.2, sorting 0 to 1.5 million items)
Up do 1280% faster, always faster for the many cases in the two tables there, except for one “95% as fast”