Thanks!
Much better:
BenchmarkTools.Trial:
memory estimate: 626.88 MiB
allocs estimate: 1031565
--------------
minimum time: 2.857 s (3.83% GC)
median time: 2.933 s (3.68% GC)
mean time: 2.933 s (3.68% GC)
maximum time: 3.008 s (3.53% GC)
--------------
samples: 2
evals/sample: 1
Still slower than SAS by a factor of 3x, but workable.
That makes me think that I should never use the @orderby
from Query.jl during joins and sort later. We have a lot of SAS code which uses the SAS SQL parser. Often we use that to order our joins and aggregations in a single step – reducing the overhead of writing and reading. The PROC SQL sort is just as fast as PROC SORT.
Is there an out of the box way to multi-thread this sort? I expect on a DataFrame this size it would help a lot.