# Sort rows in a dataframe based on a predefined order

New to Julia, and have taken (much) effort in seeking a solution to the ‘simple’ question but failed.

``````julia> df = DataFrame(col1= ['a', 'b', 'c'], col2 = [1,2,3])
3×2 DataFrame
Row │ col1  col2
│ Char  Int64
─────┼─────────────
1 │ a         1
2 │ b         2
3 │ c         3

julia> order1 = ["b", "a", "c"]
3-element Vector{String}:
"b"
"a"
"c"
``````

THANK YOU!

One thing to note, in Julia, unlike python and R, `'a'` is not the same as `"a"`. The first is a character, the second is a string.

This is actually kind of hard. I can’t think of a good solution using DataFrames.jl’s `sort` function. So instead here is a solution using DataFramesMeta.jl.

``````julia> df = DataFrame(col1= ["a", "b", "c"], col2 = [1,2,3]);

julia> order1 = ["b", "a", "c"];

julia> using DataFramesMeta

julia> @rorderby df findfirst(==(:col1), order1)
3×2 DataFrame
Row │ col1    col2
│ String  Int64
─────┼───────────────
1 │ b           2
2 │ a           1
3 │ c           3
``````

I hope someone things of something better!

3 Likes

Alternative without `DataFramesMeta` package:

``````julia> df.order=[findfirst(df.col1.==x) for x in order1]
3-element Vector{Int64}:
2
1
3

julia> sort!(df, :order)
3×3 DataFrame
Row │ col1  col2   order
│ Char  Int64  Int64
─────┼────────────────────
1 │ b         2      1
2 │ a         1      2
3 │ c         3      3

julia> select!(df, Not(:order))
3×2 DataFrame
Row │ col1  col2
│ Char  Int64
─────┼─────────────
1 │ b         2
2 │ a         1
3 │ c         3
``````
2 Likes

You could also consider (initial post had the arguments order swapped, corrected now):

``````df[indexin(order1, df.col1),:]
``````

result:

``````3×2 DataFrame
Row │ col1    col2
│ String  Int64
─────┼───────────────
1 │ b           2
2 │ a           1
3 │ c           3
``````
5 Likes

Thanks Rafael,
Your code works on this example! But it doesn’t for the dataframe below:

``````julia> df2
5×2 DataFrame
│ String         Int64
─────┼─────────────────────────
1 │ RD58-100pg_S7   6096257
2 │ RD58-10ng_S10  12590390
3 │ RD58-10pg_S6    2216467
4 │ RD58-1ng_S9     5381029
5 │ RD58-500pg_S8   6870582

julia> myOrder = ["RD58-10pg_S6", "RD58-100pg_S7", "RD58-500pg_S8", "RD58-1ng_S9", "RD58-10ng_S10"]
5-element Vector{String}:
"RD58-10pg_S6"
"RD58-100pg_S7"
"RD58-500pg_S8"
"RD58-1ng_S9"
"RD58-10ng_S10"

julia> df2[indexin(df2.sample, myOrder),:]
5×2 DataFrame
│ String         Int64
─────┼─────────────────────────
1 │ RD58-10ng_S10  12590390
2 │ RD58-500pg_S8   6870582
3 │ RD58-100pg_S7   6096257
4 │ RD58-1ng_S9     5381029
5 │ RD58-10pg_S6    2216467
``````

@pdeffebach’s code works well.

``````julia> @rorderby df2 findfirst(==(:sample), myOrder)
5×2 DataFrame
│ String         Int64
─────┼─────────────────────────
1 │ RD58-10pg_S6    2216467
2 │ RD58-100pg_S7   6096257
3 │ RD58-500pg_S8   6870582
4 │ RD58-1ng_S9     5381029
5 │ RD58-10ng_S10  12590390
``````

Thanks!

Sorry, the order of the arguments needs to be swapped (corrected order in post above).

``````using DataFrames
col1 = ["RD58-100pg_S7", "RD58-10ng_S10", "RD58-10pg_S6", "RD58-1ng_S9", "RD58-500pg_S8"]
col2 = [6096257, 12590390, 2216467, 5381029,6870582]
df = DataFrame(col1=col1, col2=col2)
order1 = ["RD58-10pg_S6", "RD58-100pg_S7", "RD58-500pg_S8", "RD58-1ng_S9", "RD58-10ng_S10"]
``````

Herein the result:

``````df[indexin(order1, df.col1),:]

5×2 DataFrame
Row │ col1           col2
│ String         Int64
─────┼─────────────────────────
1 │ RD58-10pg_S6    2216467
2 │ RD58-100pg_S7   6096257
3 │ RD58-500pg_S8   6870582
4 │ RD58-1ng_S9     5381029
5 │ RD58-10ng_S10  12590390
``````
1 Like