I have just implemented an idea I had to make
sortperm faster for Strings. It is implemented in
fsortperm; and I got the performance on a particular benchmark to be 7x faster than
Base.sortperm, see code here
Here is a rough sketch of how it works
- For each value in a string vector, load the last 4 bytes of the string into x::
UInt64where the last 32 bits are all 0 and only the first 32 bits are populated with 4 bytes of the string
- Populate the last 32 bits of
xwith the string’s position in the array
- use a modified LSD radix sort that only sorts on the first 32 bits; but note that the position of the string is also being permuted at the same time
- Obtain the last 4 bytes of each of the elements in the sorted
x's and store them in
ss; note that
ssis the original index position permuted
- Create a view of the string vector permuted by
- repeat from step 1, but instead of loading the last 4 bytes from the string vector, load the next 4 bytes (counting in reverse) in the string vector view from step 5, and populate the last 32 bits of each
xwith the corresponding
ssvalues; this is the iterative sorting step in LSD radixsort
Basically, I found that it’s less expensive to pack 4 bytes of a string and its array position into one 64 bit value and sort that instead of sorting the strings and keeping another vector of permuted indexes (as in this SortingAlgorithms’s unmerged PR).
However, R’s string performance is another total beast due to data.table’s optimization ideas and clever user of the global cache and string interning. But Julia’s getting closer even without string interning.
Slow sorting performance in Julia has been documented before. But I feel that my approach is too low-level to be a long term solution, but it’s a worthy experiment.