DataFrame sort Performance using Query.jl vs SAS PROC SORT

Thanks!

Much better:

BenchmarkTools.Trial:
  memory estimate:  626.88 MiB
  allocs estimate:  1031565
  --------------
  minimum time:     2.857 s (3.83% GC)
  median time:      2.933 s (3.68% GC)
  mean time:        2.933 s (3.68% GC)
  maximum time:     3.008 s (3.53% GC)
  --------------
  samples:          2
  evals/sample:     1

Still slower than SAS by a factor of 3x, but workable.

That makes me think that I should never use the @orderby from Query.jl during joins and sort later. We have a lot of SAS code which uses the SAS SQL parser. Often we use that to order our joins and aggregations in a single step – reducing the overhead of writing and reading. The PROC SQL sort is just as fast as PROC SORT.

Is there an out of the box way to multi-thread this sort? I expect on a DataFrame this size it would help a lot.

1 Like