[ANN] SortingLab.jl - fast sorting algorithms for strings and CategoricalArrays

I have updated the package to support Julia v1. There is no-intention of supporting Julia v0.6.

Benchmarks

sort_vs_radixsort
sortperm_vs_fsortperm

Example usage

using SortingLab;
import Test: @test

N = 1_000_000;
K = 100;

# faster string sort
svec = rand("id".*string.(1:N÷K, pad=10), N);
svec_sorted = radixsort(svec);
issorted(svec_sorted) # true
issorted(svec) # false

# faster string sortperm
sorted_idx = fsortperm(svec)
issorted(svec[sorted_idx]) #true

# in place string sort
radixsort!(svec);
issorted(svec) # true

# CategoricalArray sort
using CategoricalArrays
pools = "id".*string.(1:100,3);
byvec = CategoricalArray{String, 1}(rand(UInt32(1):UInt32(length(pools)), N), CategoricalPool(pools, false));
byvec = compress(byvec);

byvec_sorted = fsort(byvec);
@test issorted(byvec_sorted)

# in place CategoricalArray sort
fsort!(byvec)
@test issorted(byvec)
15 Likes

Great, work! Thank you for amazing library! Have you considered replicating the Base’s sorting API, so that users can just swap out the backend without having to refactor the code. This would certainly make your library more popular :slight_smile:

There is a thread about improving the sort API.

I tried to incorporate my sorting algorithm into SortingAlgorithms.jl but I found that there is a fundamental issue with how base dispatches on the sorting data. What I wrote is difficult to understand and that is because the way the dispatch works is quite complicated and can’t be summarises and understood easily IMO.

I hope there is progress soon on the API and then we can incorporate.