WIP: faster string sort

Before going to sleep I tried these two ideas, expecting nothing but…

String sorting performance vs R is only marginally (within 1.x-1.3x) slower now! In my benchmarks sorting 10m records where 100k of them are distinct yields 1.31secs (Julia) vs 1.1secs (R) (for 100m it’s around 25-27 seconds vs 18 -21 seconds).

sort_perf_10m100_u

I made these 3 changes

  1. Use my own sort function instead of Base.sort which is hard to overload as Base.sort doesn’t dispatch on alg, or more generally on named arguments. Some how Base.sort(svec, alg = StringRadixSort) is running some code which probably wasn’t all that necessary.
  2. Change the SortingAlgorithms.RADIXSIZE to 16 from 11. This means sort 16 bits in one go instead of 11 bits. After getting comfortable with radixsort, I was curious why SortingAlgorithms.jl chose 11 as the RADIXSIZE, the most common types are probably multiples of 16 bits, e.g. UInt32 is 32 bits and UInt64 is 64 bits. Sorting 11 bits in one go, we need 3 passes over the data to sort 32 bit type and 6 passes over the data for a 64 bit type vs 2 and 4 passes if we just choose a RADIXSIZE of 16 bits. From benchmarking, increasing the RADIXSIZE results in faster sorting too.
  3. Do not load underlying bits into UInt128 and instead use UInt64 as the maximum. I found that it’s quicker to sort twice using UInt64 instead of sorting once UInt128. When I said sort “once” the data has been gone through 4 times when sorting the underlying bytes as 64bit; so really the number of passes over the data is the same whether you use UInt128 or UInt64 but it seems Julia can handle 64bit data better which makes sense given that I am running on a 64bit machine.
9 Likes