In the latest version of DataConvience.jl you get functions
fsort! for faster sorting of DataFrames. Sorry currently only ascending order sort is possible. Test out the speed difference for yourself! For me I get 8x speed up for some of tests.
using DataConvenience, DataFrames using Random M = 100_000_000 str_base = [randstring(8) for i in 1:1_000_000] df = DataFrame(int = rand(Int32, M), float=rand(M), str = rand(str_base, M)) time1 =@elapsed df1 = sort(df, :int); time2 =@elapsed df2 = fsort(df, :int); df1 == df2 df != df2 time3 =@elapsed df1 = sort(df, :str); time4 =@elapsed df2 = fsort(df, :str); df1 == df2 df != df2 time5 =@elapsed df1 = sort(df, [:str, :float]); time6 =@elapsed df2 = fsort(df, [:str, :float]); df1 == df2 df != df2 using Plots using StatsPlots groupedbar( repeat(["sort Int","sort String", "sort String and Float64"], inner = 2), [time1, time2, time3, time4, time5, time6], group = repeat(["`sort`", "`fsort`"], outer=3), title = "Sort 100m rows: `sort` vs `fsort`") savefig("benchmarks/sort_vs_fsort.png")
Ideally, contributing a faster
sortperm back to base or
SortingAlgorithms.jl would be key to performance. However, until that happens you have