DataFrame sort Performance using Query.jl vs SAS PROC SORT

Dominic_Pazzula · June 11, 2020, 8:00pm

Thanks!

Much better:

BenchmarkTools.Trial:
  memory estimate:  626.88 MiB
  allocs estimate:  1031565
  --------------
  minimum time:     2.857 s (3.83% GC)
  median time:      2.933 s (3.68% GC)
  mean time:        2.933 s (3.68% GC)
  maximum time:     3.008 s (3.53% GC)
  --------------
  samples:          2
  evals/sample:     1

Still slower than SAS by a factor of 3x, but workable.

That makes me think that I should never use the @orderby from Query.jl during joins and sort later. We have a lot of SAS code which uses the SAS SQL parser. Often we use that to order our joins and aggregations in a single step – reducing the overhead of writing and reading. The PROC SQL sort is just as fast as PROC SORT.

Is there an out of the box way to multi-thread this sort? I expect on a DataFrame this size it would help a lot.

Topic		Replies	Views
Julia (with compile time) about 5.6 times faster than SAS Performance	23	2429	June 28, 2020
Adding SAS to benchmark comparison Meta Discussion	10	2194	February 21, 2018
Sort on tuple is > 4x as fast as sortrows on the equivalent array General Usage performance , sort	2	660	September 13, 2017
Numpy.sort vs Julia sort Performance benchmark , python	10	971	December 4, 2024
ANN: SASLib.jl Data	14	2306	December 30, 2017

DataFrame sort Performance using Query.jl vs SAS PROC SORT

Related topics