Is it possible to sort a DataFrame column based on custom sort order, e.g.
df = DataFrame(col1= ['a','b','c'], col2 = [1,2,3])
sort!(df, col1=custom_order('a','c','b'))
Is it possible to sort a DataFrame column based on custom sort order, e.g.
df = DataFrame(col1= ['a','b','c'], col2 = [1,2,3])
sort!(df, col1=custom_order('a','c','b'))
You can always use
df[p, :]
where p
is a permutation that encodes your custom order, but if it can be specified by some user-supplied less-than function, you are better off using the sort
method for DataFrame
.
Elaborating on this SO question, I am trying to use a custom less-than function, to be used with the lt
sort keyword, but I donβt know how to provide to the function a third parameter that is the wanted βcustom orderβ (for multiple column sort):
For example, this works:
using DataFrames
df = DataFrame(
c1 = ['a','b','c','a','b','c'],
c2 = ["aa","aa","bb","bb","cc","cc"],
c3 = [1,2,3,10,20,30]
)
6Γ3 DataFrames.DataFrame
β Row β c1 β c2 β c3 β
βββββββΌββββββΌβββββββΌβββββ€
β 1 β 'a' β "aa" β 1 β
β 2 β 'b' β "aa" β 2 β
β 3 β 'c' β "bb" β 3 β
β 4 β 'a' β "bb" β 10 β
β 5 β 'b' β "cc" β 20 β
β 6 β 'c' β "cc" β 30 β
ordc2 = ["bb","aa","cc"]
function customLt(r1,r2)
return ( find(x -> x == r1, ordc2)[1] < find(x -> x == r2, ordc2)[1] )
end
sortedDf = sort(df, cols = [order(:c2, lt=customLt)])
6Γ3 DataFrames.DataFrame
β Row β c1 β c2 β c3 β
βββββββΌββββββΌβββββββΌβββββ€
β 1 β 'c' β "bb" β 3 β
β 2 β 'a' β "bb" β 10 β
β 3 β 'a' β "aa" β 1 β
β 4 β 'b' β "aa" β 2 β
β 5 β 'b' β "cc" β 20 β
β 6 β 'c' β "cc" β 30 β
But this doesnβt:
using DataFrames
df = DataFrame(
c1 = ['a','b','c','a','b','c'],
c2 = ["aa","aa","bb","bb","cc","cc"],
c3 = [1,2,3,10,20,30]
)
ordc1 = ['b','a','c']
ordc2 = ["bb","aa","cc"]
function customLt(r1,r2,col)
return ( find(x -> x == r1, col)[1] < find(x -> x == r2, col)[1] )
end
sortedDf = sort(df, cols = [order(:c2, lt=customLt(ordc2)),order(:c1, lt=customLt(ordc1))] )
MethodError: no method matching customLt(::Array{String,1})
Closest candidates are:
customLt(::Any, !Matched::Any) at /home/lobianco/git/ffsm_pp/00_private/2016_foretcc/data/output/test.jl:59
customLt(::Any, !Matched::Any, !Matched::Any) at /home/lobianco/git/ffsm_pp/00_private/2016_foretcc/data/output/test.jl:73
[...]
This should be faster for lookup:
using DataFrames
df = DataFrame(
c1 = ['a','b','c','a','b','c'],
c2 = ["aa","aa","bb","bb","cc","cc"],
c3 = [1,2,3,10,20,30]
)
ordc2 = ["bb","aa","cc"]
orderdict = Dict(x => i for (i,x) in enumerate(ordc2))
sort(df; cols = [:c2], by = x->orderdict[x])
Thank you⦠I managed to implement it for multiple columns and to place it with a function where users can select custom order (eventually partially) for each of the columns:
function customSort!(df::DataFrame, sortops)
sortv = []
sortOptions = []
if(isa(sortops, Array))
sortv = sortops
else
push!(sortv,sortops)
end
for i in sortv
if(isa(i, Tuple))
if (isa(i[2], Array)) # The second option is a custom order
orderArray = Array(collect(union( OrderedSet(i[2]), OrderedSet(unique(df[i[1]])) )))
push!(sortOptions, order(i[1], by = x->Dict(x => i for (i,x) in enumerate(orderArray))[x] ))
else # The second option is a reverse direction flag
push!(sortOptions, order(i[1], rev = i[2]))
end
else
push!(sortOptions, order(i))
end
end
return sort!(df, cols = sortOptions)
end
df = DataFrame(
c1 = ['a','b','c','a','b','c'],
c2 = ["aa","aa","bb","bb","cc","cc"],
c3 = [1,2,3,10,20,30],
)
customSort!(df, [(:c2,["bb","cc"]),(:c1,['b','a','c'])])
6Γ3 DataFrames.DataFrame
β Row β c1 β c2 β c3 β
βββββββΌββββββΌβββββββΌβββββ€
β 1 β 'a' β "bb" β 10 β
β 2 β 'c' β "bb" β 3 β
β 3 β 'b' β "cc" β 20 β
β 4 β 'c' β "cc" β 30 β
β 5 β 'b' β "aa" β 2 β
β 6 β 'a' β "aa" β 1 β
if someone needs it, I uploaded to github together with other utility functions on GitHub - sylvaticus/LAJuliaUtils.jl: Utility functions for Julia, mainly dataframes operations
To use it:
Pkg.clone("https://github.com/sylvaticus/LAJuliaUtils.jl.git")
using LAJuliaUtils
?customSort!
Another way to tackle this problem is to use a column type where you can control the sorting. The advantage comes when sorting is done behind the scenes (with grouping or plotting for example); the order will come out like you want.
Here is an example with PooledArrays.jl:
julia> d = PooledArray(["a", "b", "a"], ["b", "a"])
3-element PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}}:
"a"
"b"
"a"
julia> sort(d)
3-element PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}}:
"b"
"a"
"a"
Note that this trick doesnβt work with DataArrays
. I didnβt check if it works with CategoricalArrays
.