WIP: faster string sort

I have switched to VScode and I get faster timings than in Juno already.

The strcmp is quite good as it’s performance only degrades a little when comparing long strings where as the radixsort solution will be bounded by O(ln) where l is the (average?) length of the strings. For longer strings I can see strcmp outperforming radixsort. I tried sorting length 1-8 strings of size 100 million the radix sort took 26 seconds and strcmp took 56 seconds; so in generally radixsort should still be faster. Perhaps the fastest radixsort is to call into a C implementation. But it would be great to not have to resort to C, cos Julia is meant to “solve” the two language problem. Perhaps this is one area where we have to admit defeat? Or Julia will come back with a much more performant way to rearrange the elements of string vectors.

using FastGroupBy

const M=10_000_000; const K=100
srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
# using FastGroupBy.radixsort! to sort strings of length 8
@time radixsort!(svec1); # 1.4 seconds
issorted(svec1)

srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
strcmp(a::String, b::String) = ccall(:strcmp, Int32, (Cstring, Cstring), pointer(a), pointer(b))
@time sort!(svec1, lt=(x,y)->strcmp(x,y)<0, alg=QuickSort); # 2.7 seconds
issorted(svec1)

srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
# using Base.sort! to sort strings of length 8
@time sort!(svec1); # 4 seconds on 10m
issorted(svec1)


srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:16))...) for k in 1:M÷K], M)
# using FastGroupBy.radixsort! to sort strings of length 8
@time radixsort!(svec1); # 2.3 seconds
issorted(svec1)

srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:16))...) for k in 1:M÷K], M)
strcmp(a::String, b::String) = ccall(:strcmp, Int32, (Cstring, Cstring), pointer(a), pointer(b))
@time sort!(svec1, lt=(x,y)->strcmp(x,y)<0, alg=QuickSort); # 2.7 seconds
issorted(svec1)

srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:16))...) for k in 1:M÷K], M)
# using Base.sort! to sort strings of length 8
@time sort!(svec1); # 4.6 seconds on 10m
issorted(svec1)