Thanks to @ChrisRackauckas’s great suggestion, the best known Julia implementation for string sort actually beats R when the ratio of unique values to vector length is greater than 3:10 in this particular setup-10m id strings with common prefix with variouz number of unique strings
As can be seen Julia’s radix sort outperforms R’s when the number of unique values is large. For this particular case I have 10m-length vector and the cross over point is at 3m unique values.
It is also true that R takes significantly longer to generate the synthetic data, possibly due to the building of global cache for string interning; or it could just be naturally slower, but it’s hard to separate out the two effects.