Progress towards faster `sortperm` for Strings

xiaodai · January 22, 2018, 12:46pm

Thanks to @ChrisRackauckas’s great suggestion, the best known Julia implementation for string sort actually beats R when the ratio of unique values to vector length is greater than 3:10 in this particular setup-10m id strings with common prefix with variouz number of unique strings

As can be seen Julia’s radix sort outperforms R’s when the number of unique values is large. For this particular case I have 10m-length vector and the cross over point is at 3m unique values.

It is also true that R takes significantly longer to generate the synthetic data, possibly due to the building of global cache for string interning; or it could just be naturally slower, but it’s hard to separate out the two effects.

Topic		Replies	Views
10x faster sortperm() Performance sortperm	10	2699	October 29, 2024
WIP: faster string sort Internals & Design strings , sort	92	12383	February 8, 2018
Ironic observation about `sort` and `sortperm` speed for "small integers" vs R Performance sort , sortperm , r	32	4661	February 4, 2018
[ANN] SortingLab.jl - fast sorting algorithms for strings and CategoricalArrays Package Announcements sort , sortperm	2	1112	January 15, 2019
Fast permutation vector: code needed Performance sorting	38	626	March 21, 2023

Progress towards faster `sortperm` for Strings

Related topics