Thanks so much these solutions they are super helpful! Sorry it took me a little longer to understand how sortperm was working here. Just to clarify sortperm is returning a vector of row indices ordered by the row wise max of the values in the two columns?
Also it won’t make a practical difference in this example since they are both so fast, so I understand if you don’t have time to go into it, but I was just wondering why the difference in speed of the two approaches appears relatively large.
The difference in performance is because solution using sortperm is a custom approach and @orderby uses a more general mechanism (thus it is less optimized).
but looking at the implementation, it does use sortperm.
function orderby_helper(x, args...)
x, exprs, outer_flags, kw = get_df_args_kwargs(x, args...; wrap_byrow = false)
t = (fun_to_vec(ex; gensym_names = true, outer_flags = outer_flags) for ex in exprs)
quote
$orderby($x, $(t...); $(kw...))
end
end
function orderby(x::AbstractDataFrame, @nospecialize(args...))
t = DataFrames.select(x, args...; copycols = false)
x[sortperm(t), :]
end
the difference is too big to be due to the overhead of select, especially with copycols = false. So I don’t know what’s going on.
EDIT: I see, it’s the difference between sortperm(v::Vector) and sortperm(df::DataFrame). The latter is what @orderby calls, and this requires extra work in case the DataFrame has multiple columns.
There is room for a specialized implementation if the DataFrame has one column, which I think DataFramesMeta.jl can do.