Thank you all again for being so generous with your insight and knowledge. It is incredibly helpful and much appreciated. I marked @Dan 's original answer due to the simplicity in the code and as pointed out by bkamins and nilshg the delta in competition time doesn’t necessarily carry over into larger tables. However, I might be wrong, but is it the case that the DataFramesMeta solution
should be modified to
@orderby(df7, -max.(AsTable(r"id")...))
I am likely missing something here, but in the following example each of these solutions generate the same DataFrame
df7 = DataFrame(Aid = rand(-100:100, 10000000), Bid = rand(-100:100, 10000000), C = rand(-100:100,10000000), Did = rand(-100:100,10000000))
mxbyrow = df7[sortperm(select(df7, r"id" => ByRow(max) => :maxid); rev=true),:]
mxclmn = df7[sortperm(max.(df7.Aid, df7.Bid, df7.Did),rev = true), :]
mxeachclmn = df7[sortperm(max.(eachcol(df7[!, r"id"])...),rev = true), :]
mxfltr = df7[ sortperm(max.([df7[!,c] for c in filter(n-> endswith(n,"id"), names(df7))]...), rev=true),:]
using DataFrameMacros
mxmacro = @sort(df7, -maximum({{r"id"}}))
mxfltr == mxclmn == mxeachclmn == mxmacro == mxbyrow
true
However with the DataFramesMeta
approach
using DataFrames
mxmeta = @orderby df7 -maximum(AsTable(r"id"))
mxclmn == mxmeta
false
but if I use the syntax from @bkamins previous post here Sort DataFrame by the greater of multiple columns
mxmeta2 = @orderby(df7, -max.(AsTable(r"id")...))
then
mxclmn == mxmeta2
true
so
@orderby(df7, -max.(AsTable(r"id")...)) == @orderby df7 -maximum(AsTable(r"id"))
false
Is this correct? I am probably missing something but I can’t quite figure out how @orderby df7 -maximum(AsTable(r"id"))
is ranking the rows.
Just as an aside in terms of speed, if the rows are ranked in ascending rather than descending order, it seems that the non macro approaches are still a bit faster.
@benchmark @orderby($df7, max.(AsTable(r"id")...))
BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min … max): 967.029 ms … 1.118 s ┊ GC (min … max): 0.00% … 13.52%
Time (median): 1.105 s ┊ GC (median): 12.17%
Time (mean ± σ): 1.078 s ± 62.659 ms ┊ GC (mean ± σ): 10.06% ± 5.54%
█ █ ██ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁██▁▁▁█ ▁
967 ms Histogram: frequency by time 1.12 s <
Memory estimate: 2.24 GiB, allocs estimate: 2957653.
julia> @benchmark $df7[sortperm(max.(eachcol($df7[!, r"id"])...)), :]
BenchmarkTools.Trial: 25 samples with 1 evaluation.
Range (min … max): 193.540 ms … 233.784 ms ┊ GC (min … max): 0.00% … 16.63%
Time (median): 196.598 ms ┊ GC (median): 0.00%
Time (mean ± σ): 200.730 ms ± 10.667 ms ┊ GC (mean ± σ): 1.98% ± 4.84%
▁▄▁█▁ ▁▁
█████▆▆▁▆██▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▆ ▁
194 ms Histogram: frequency by time 234 ms <
Memory estimate: 457.77 MiB, allocs estimate: 59.