Sorting a DataFrame based on a multi-column function

Is there an easier way to do this?
sort(transform(df, [:b, :c] => ByRow(max) => :__max__), :__max__, rev=true)[:, names(df)]

I want to sort based on a function of more than one column (in this case the max of the selected columns), hopefully without adding an extra column.

Example df:
df = DataFrame(a=["d", "c", "b", "a"], b=[1,2,3,4], c=[-2, 5, 1,2])

I tried these, but they did not work:
sort(df, [:b, :c], by=maximum)
sort(df; by=row -> max(row...))
sort(df; by=row -> max(row.b, row.c)) ← Google AI told me this would work. Google AI was wrong
sort(df, [:b, :c]; by=ByRow((b,c) -> max(b,c)))
sort(df, [:b, :c]; by=(b,c) -> max(b,c))

Is there any way sort based on a function of multiple columns? (other than my workaround at the top)

From what I understand, sort considers columns as independent when sorting and does not allow the type of combination you want.
Instead, you could just use indexing to a similar effect:

df[sortperm(max.(df.b, df.c); rev=true), names(df)]
2 Likes

Or even

df[sortperm(max.(df.b, df.c); rev=true), :]

:wink:

1 Like

In DataFramesMeta.jl this is

@orderby df max.(:b, :c)
1 Like