Before going to sleep I tried these two ideas, expecting nothing but…
String sorting performance vs R is only marginally (within 1.x-1.3x) slower now! In my benchmarks sorting 10m records where 100k of them are distinct yields 1.31secs (Julia) vs 1.1secs (R) (for 100m it’s around 25-27 seconds vs 18 -21 seconds).
I made these 3 changes
- Use my own sort function instead of
Base.sort
which is hard to overload asBase.sort
doesn’t dispatch onalg
, or more generally on named arguments. Some howBase.sort(svec, alg = StringRadixSort)
is running some code which probably wasn’t all that necessary. - Change the
SortingAlgorithms.RADIXSIZE
to 16 from 11. This means sort 16 bits in one go instead of 11 bits. After getting comfortable with radixsort, I was curious whySortingAlgorithms.jl
chose 11 as theRADIXSIZE
, the most common types are probably multiples of 16 bits, e.g. UInt32 is 32 bits and UInt64 is 64 bits. Sorting 11 bits in one go, we need 3 passes over the data to sort 32 bit type and 6 passes over the data for a 64 bit type vs 2 and 4 passes if we just choose aRADIXSIZE
of 16 bits. From benchmarking, increasing theRADIXSIZE
results in faster sorting too. - Do not load underlying bits into
UInt128
and instead useUInt64
as the maximum. I found that it’s quicker to sort twice usingUInt64
instead of sorting onceUInt128
. When I said sort “once” the data has been gone through 4 times when sorting the underlying bytes as 64bit; so really the number of passes over the data is the same whether you useUInt128
orUInt64
but it seems Julia can handle 64bit data better which makes sense given that I am running on a 64bit machine.