For each row, return n lowest column indexes or column names

Hi all,
New to Julia and seeking a little help. I am working with a dataframe and for each row, I want to return either the position of the lowest n values, or the column names of the lowest n value. The package DataStructures (Heaps · DataStructures.jl) has a function that does this, but I really want to avoid loading a package just for this, as I’m sure the solution is relatively simple.

Say my dataframe was as follows:

df = DataFrame((name=["S1","S2","S3","S4","S5"],Z1=[41,12,19,17,11],Z2=[13,28,29,99,41],Z3=[7,86,23,12,71],Z4=[4,13,23,11,19],Z5=[41,12,13,19,22],Z6=[11,18,22,46,5]))

And if I were looking for the lowest 2 values, the result would be as follows:

df_sorted = DataFrame((name=["S1","S2","S3","S4","S5"],x1=["Z4","Z1","Z5","Z4","Z6"],x2=["Z3","Z5","Z1","Z3","Z1"]))

I thought maybe a combination of eachrow() and PartialQuickSort might do the trick, e.g.:


But this results in an error.
If anyone has any suggestions, I would very much appreciate it. Thanks for your help!

You should use InMemoryDatasets; transpose each row and sort your data set (sort is very fast in that package)

Here’s a good solution. Its a bit of a complicated function I wrote, though.

julia> function sort_row_return_top_two(row)
           vec_pairs = collect(pairs(row))
           # Sort by the 2nd element in the pair (the value)
           # and not the name)
           sorted_vec_pairs = sort(vec_pairs, by = last)
           # Get the names of the top two

julia> select(df, "name", AsTable(r"^Z") => ByRow(sort_row_return_top_two) => ["first","second"])
5×3 DataFrame
 Row │ name    first   second 
     │ String  String  String 
   1 │ S1      Z1      Z5
   2 │ S2      Z2      Z3
   3 │ S3      Z4      Z2
   4 │ S4      Z6      Z2
   5 │ S5      Z2      Z3