Before going to sleep I tried these two ideas, expecting nothing but…

**String sorting performance vs R is only marginally (within 1.x-1.3x) slower now!** In my benchmarks sorting 10m records where 100k of them are distinct yields 1.31secs (Julia) vs 1.1secs (R) (for 100m it’s around 25-27 seconds vs 18 -21 seconds).

I made these 3 changes

- Use my own sort function instead of
`Base.sort`

which is hard to overload as`Base.sort`

doesn’t dispatch on`alg`

, or more generally on named arguments. Some how`Base.sort(svec, alg = StringRadixSort)`

is running some code which probably wasn’t all that necessary. - Change the
`SortingAlgorithms.RADIXSIZE`

to 16 from 11. This means sort 16 bits in one go instead of 11 bits. After getting comfortable with radixsort, I was curious why`SortingAlgorithms.jl`

chose 11 as the`RADIXSIZE`

, the most common types are probably multiples of 16 bits, e.g. UInt32 is 32 bits and UInt64 is 64 bits. Sorting 11 bits in one go, we need 3 passes over the data to sort 32 bit type and 6 passes over the data for a 64 bit type vs 2 and 4 passes if we just choose a`RADIXSIZE`

of 16 bits. From benchmarking, increasing the`RADIXSIZE`

results in faster sorting too. - Do not load underlying bits into
`UInt128`

and instead use`UInt64`

as the maximum. I found that it’s quicker to sort twice using`UInt64`

instead of sorting once`UInt128`

. When I said sort “once” the data has been gone through 4 times when sorting the underlying bytes as 64bit; so really the number of passes over the data is the same whether you use`UInt128`

or`UInt64`

but it seems Julia can handle 64bit data better which makes sense given that I am running on a 64bit machine.