Xiao
September 16, 2021, 3:48pm
1
New to Julia, and have taken (much) effort in seeking a solution to the ‘simple’ question but failed.
julia> df = DataFrame(col1= ['a', 'b', 'c'], col2 = [1,2,3])
3×2 DataFrame
Row │ col1 col2
│ Char Int64
─────┼─────────────
1 │ a 1
2 │ b 2
3 │ c 3
julia> order1 = ["b", "a", "c"]
3-element Vector{String}:
"b"
"a"
"c"
Please help on how to sort the ‘df’ to follow the ‘order1’.
THANK YOU!
One thing to note, in Julia, unlike python and R, 'a'
is not the same as "a"
. The first is a character, the second is a string.
This is actually kind of hard. I can’t think of a good solution using DataFrames.jl’s sort
function. So instead here is a solution using DataFramesMeta.jl.
julia> df = DataFrame(col1= ["a", "b", "c"], col2 = [1,2,3]);
julia> order1 = ["b", "a", "c"];
julia> using DataFramesMeta
julia> @rorderby df findfirst(==(:col1), order1)
3×2 DataFrame
Row │ col1 col2
│ String Int64
─────┼───────────────
1 │ b 2
2 │ a 1
3 │ c 3
I hope someone things of something better!
3 Likes
Alternative without DataFramesMeta
package:
julia> df.order=[findfirst(df.col1.==x) for x in order1]
3-element Vector{Int64}:
2
1
3
julia> sort!(df, :order)
3×3 DataFrame
Row │ col1 col2 order
│ Char Int64 Int64
─────┼────────────────────
1 │ b 2 1
2 │ a 1 2
3 │ c 3 3
julia> select!(df, Not(:order))
3×2 DataFrame
Row │ col1 col2
│ Char Int64
─────┼─────────────
1 │ b 2
2 │ a 1
3 │ c 3
2 Likes
You could also consider (initial post had the arguments order swapped, corrected now ):
df[indexin(order1, df.col1),:]
result:
3×2 DataFrame
Row │ col1 col2
│ String Int64
─────┼───────────────
1 │ b 2
2 │ a 1
3 │ c 3
6 Likes
Xiao
September 17, 2021, 1:22am
5
Thanks Rafael,
Your code works on this example! But it doesn’t for the dataframe below:
julia> df2
5×2 DataFrame
Row │ sample reads
│ String Int64
─────┼─────────────────────────
1 │ RD58-100pg_S7 6096257
2 │ RD58-10ng_S10 12590390
3 │ RD58-10pg_S6 2216467
4 │ RD58-1ng_S9 5381029
5 │ RD58-500pg_S8 6870582
julia> myOrder = ["RD58-10pg_S6", "RD58-100pg_S7", "RD58-500pg_S8", "RD58-1ng_S9", "RD58-10ng_S10"]
5-element Vector{String}:
"RD58-10pg_S6"
"RD58-100pg_S7"
"RD58-500pg_S8"
"RD58-1ng_S9"
"RD58-10ng_S10"
julia> df2[indexin(df2.sample, myOrder),:]
5×2 DataFrame
Row │ sample reads
│ String Int64
─────┼─────────────────────────
1 │ RD58-10ng_S10 12590390
2 │ RD58-500pg_S8 6870582
3 │ RD58-100pg_S7 6096257
4 │ RD58-1ng_S9 5381029
5 │ RD58-10pg_S6 2216467
@pdeffebach ’s code works well.
julia> @rorderby df2 findfirst(==(:sample), myOrder)
5×2 DataFrame
Row │ sample reads
│ String Int64
─────┼─────────────────────────
1 │ RD58-10pg_S6 2216467
2 │ RD58-100pg_S7 6096257
3 │ RD58-500pg_S8 6870582
4 │ RD58-1ng_S9 5381029
5 │ RD58-10ng_S10 12590390
Thanks!
Sorry, the order of the arguments needs to be swapped (corrected order in post above ).
For your last example:
using DataFrames
col1 = ["RD58-100pg_S7", "RD58-10ng_S10", "RD58-10pg_S6", "RD58-1ng_S9", "RD58-500pg_S8"]
col2 = [6096257, 12590390, 2216467, 5381029,6870582]
df = DataFrame(col1=col1, col2=col2)
order1 = ["RD58-10pg_S6", "RD58-100pg_S7", "RD58-500pg_S8", "RD58-1ng_S9", "RD58-10ng_S10"]
Herein the result:
df[indexin(order1, df.col1),:]
5×2 DataFrame
Row │ col1 col2
│ String Int64
─────┼─────────────────────────
1 │ RD58-10pg_S6 2216467
2 │ RD58-100pg_S7 6096257
3 │ RD58-500pg_S8 6870582
4 │ RD58-1ng_S9 5381029
5 │ RD58-10ng_S10 12590390
1 Like