# Faster default Base.sort! for Julia

There are functions like `factorial` where Julia uses a `factorial_lookup` function to directly return results from a table when the value is lower or equal to 20.

Playing around with sorting algorithms I built a simple one that takes into account the size of the vector to sort just like `factorial` takes into a account the value passed to calculate its factorial.

As it turns out the algorithm taking into account size is consistently faster than the default for Julia up to a size of five elements, hence my question, why does not Julia uses the same strategy for `sort!` as for `factorial` considering how critical sorting algorithms are in computing?

Below follows the code and some benchmarks:

``````function sort2!(x)
swap(i,j) = x[j], x[i] = x[i], x[j]
x[1]>x[2] && swap(1,2)
end

function sort3!(x)
swap(i,j) = x[j], x[i] = x[i], x[j]
if x[1]>x[2] sort2!(x) end
if x[2]>x[3] swap(2,3); sort2!(x) end
end

function sort4!(x)
swap(i,j) = x[j], x[i] = x[i], x[j]
if x[1]>x[2] sort2!(x) end
if x[2]>x[3] swap(2,3); sort2!(x) end
if x[3]>x[4] swap(3,4); sort3!(x) end
end

function sort5!(x)
swap(i,j) = x[j], x[i] = x[i], x[j]
if x[1]>x[2] sort2!(x) end
if x[2]>x[3] swap(2,3); sort2!(x) end
if x[3]>x[4] swap(3,4); sort3!(x) end
if x[4]>x[5] swap(4,5); sort4!(x) end
end
``````

I am just using this algorithm to test the â€śfactorialâ€ť idea, I havenâ€™t done any theoretical work on it to guarantee it is sound, to make sure it works I compared its results multiple times with Base.sort!

Comparing benchmarks for vector of length 3 we have:

``````sort!:        117.439 ns Â± 42.243 ns
sort3!:      103.014 ns Â± 76.325 ns
``````

Comparing benchmarks for vector of length 5 we have:

``````sort!:      135.428 ns Â± 43.606 ns
sort5!:    128.527 ns Â± 60.103 ns
``````

Comparing benchmarks for vector of length 6 we have:

``````sort!:      148.199 ns Â± 45.349 ns
sort6!:    151.984 ns Â± 59.283 ns
``````

If Iâ€™m not mistaken, youâ€™ve basically rediscovered sorting networks.

I guess the simple reason is that when sorting vectors of such short length, 17 nano seconds difference is basically a rounding error, especially with other noise going on in a real-life workload. Itâ€™s more important to do these kinds of optimizations with e.g. factorial or fibonacci implementations, because it vastly reduces the time common calls take. In contrast, most calls to `sort!` have much longer arguments so this would be optimizing the very rare case.

2 Likes

The difference is consistent, theyâ€™re faster, also I just read in the article you provided that sorting networks can be parallelized! In the article we can also read:

An optimal network for size 11 was found in December 2019 by Jannis Harder

Why donâ€™t we parallelize and use that one in Julia?

That depends on what field of science youâ€™re working on, what is rare for some might be common for some othersâ€¦ also, we can always challenge other languages to see how fast they sort short vectors

That can be a nice micro optimization.

It looks like youâ€™ve been bitten by the sorting bug. For more on sorting algorithms, see this thread. I posted 4 links you might find of interest.

At the bottom of the referenced link, I post a couple of links to algorithms that use SIMD and multiple threads. That would be the ultimate sort right now for primitive types. The way to get such a sort incorporated into base would be to first release a package with it. After a while, once itâ€™s been debugged, and lots of people are finding it useful, there is a chance it could be pulled into base.

Or â€¦ maybe it just stays in a package. Packages work great. Although it can be tedious to load two dozen packages and re-add them for Julia releases. I canâ€™t say I know of a better alternative. Package compiler can help somewhat.

It also depends on the philosophy of Julia. I created a post asking about that, but it didnâ€™t get much traction. Do we want everything to be eventually SIMD and multithreaded? Or do people get annoyed when functions launch threads and prefer to have single threaded, unless they ask otherwise?

Micro optimization would be low-level optimizations that do not change the overall structure of the program.

Sorting Networks do change the structure of the program, sure we can then micro optimize them as well, but weâ€™re not talking anything low level now, we are talking about adding a faster algorithm for short arrays than the Julia `sort!` function is using and dispatch those algorithms based on the size of the vector.

2 Likes