In the latest version of DataConvience.jl you get functions fsort and fsort! for faster sorting of DataFrames. Sorry currently only ascending order sort is possible. Test out the speed difference for yourself! For me I get 8x speed up for some of tests.

using DataConvenience, DataFrames
using Random
M = 100_000_000
str_base = [randstring(8) for i in 1:1_000_000]
df = DataFrame(int = rand(Int32, M), float=rand(M), str = rand(str_base, M))
time1 =@elapsed df1 = sort(df, :int);
time2 =@elapsed df2 = fsort(df, :int);
df1 == df2
df != df2
time3 =@elapsed df1 = sort(df, :str);
time4 =@elapsed df2 = fsort(df, :str);
df1 == df2
df != df2
time5 =@elapsed df1 = sort(df, [:str, :float]);
time6 =@elapsed df2 = fsort(df, [:str, :float]);
df1 == df2
df != df2
using Plots
using StatsPlots
groupedbar(
repeat(["sort Int","sort String", "sort String and Float64"], inner = 2),
[time1, time2, time3, time4, time5, time6],
group = repeat(["`sort`", "`fsort`"], outer=3),
title = "Sort 100m rows: `sort` vs `fsort`")
savefig("benchmarks/sort_vs_fsort.png")

I’m surprised that there’s still so much more room for improvement for simple sorting of vectors. I thought sorting was such a well studied problem that Base would already have optimal implementations of sort/sortperm.

What’s holding up the replacement of Base’s implementations with your fsort/fsortperm? Sorting is so ubiquitous that this seems like a big performance win for everyone. Are there corner cases or any downsides?

My comment was not based on the performance of DataFrame sorting, but rather on a quick test I had done of Base’s default sort and your fsort.

After further testing, I now see that the downside of RadixSort, as used by fsort, is that it’s performance doesn’t improve when sorting an array that’s already or mostly sorted, but the default sorting in Base does.

It’s also good to see that median and quantile have performance in line with RadixSort. So yeah, Base’s sort is fine, but one should be aware that RadixSort will greatly improve performance on mostly unsorted arrays.

julia> x = rand(10^7);
julia> @btime sort($x);
672.342 ms (2 allocations: 76.29 MiB)
julia> @btime sort($x, alg=RadixSort);
269.020 ms (18 allocations: 152.78 MiB)
julia> sx = sort(x);
julia> @btime sort($sx);
128.143 ms (2 allocations: 76.29 MiB)
julia> @btime sort($sx, alg=RadixSort);
266.842 ms (18 allocations: 152.78 MiB)
julia> @btime median($x);
163.452 ms (2 allocations: 76.29 MiB)
julia> @btime quantile($x, 0.5);
163.203 ms (2 allocations: 76.29 MiB)

IIRC, both SortingAlgorithms.jl and SortingLab.jl use LSD radix sort. Does somebody compare against MSD radix sort?

My naive guess is that MSD is better than LSD for variable length elements like strings and vectors and also work nicely with task-based parallelism. I also wonder how parallel multi-pivot quick sort compare with parallel radix sort. (In fact, I wrote parallel MSD radix sort and meant to release it. But I haven’t have time to polish it…)

Parallel MSD. I had experiment with MSD pre Julia 1.3 and I didn’t understand alot of concepts like allocations etc then. So it was much slower. I wonder if MSD using threading would work better