Hello,
This is literally my second question and day 3 of Julia, Previously i was a VBA guy with some Python, C++ and Matlab experience.
I have already searched both Google, Forum and Help file, but I don’t seem to be able to find a ranking function in Julia Dataframe.
Suppose that i have a dataframe as such:
Pkg.add(“DataFrames”)
using DataFrames
df = DataFrame(Types = [“SUV”, “SUV”,“SUV”, “SUV”, “sedan”,“sedan”], models=[“Q3”,“Q5”, “Kluger”, “Land Cruiser”, “Corolla”, “F40”], acceleration = [11,8, 8, 19, 5.5,3.3,])
6×3 DataFrame
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Q3 11.0
2 │ SUV Q5 8.0
3 │ SUV Kluger 8.0
4 │ SUV Land Cruiser 19.0
5 │ sedan Corolla 5.5
6 │ sedan F40 3.3
How do I add a column and rank the acceleration by “Types”?
Cheers,
Nelson
You could for example do:
# create df
julia> df = DataFrame(Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"], models=["Q3","Q5", "Kluger", "Land Cruiser", "Coroll
a", "F40"], acceleration = [11,8, 8, 19, 5.5,3.3,])
6×3 DataFrame
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Q3 11.0
2 │ SUV Q5 8.0
3 │ SUV Kluger 8.0
4 │ SUV Land Cruiser 19.0
5 │ sedan Corolla 5.5
6 │ sedan F40 3.3
# group df by types
julia> gdf = groupby(df, "Types")
GroupedDataFrame with 2 groups based on key: Types
First Group (4 rows): Types = "SUV"
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Q3 11.0
2 │ SUV Q5 8.0
3 │ SUV Kluger 8.0
4 │ SUV Land Cruiser 19.0
⋮
Last Group (2 rows): Types = "sedan"
Row │ Types models acceleration
│ String String Float64
─────┼───────────────────────────────
1 │ sedan Corolla 5.5
2 │ sedan F40 3.3
# sort subdataframes by acceleration in descending order
julia> sort(gdf[1], "acceleration", rev = true)
4×3 DataFrame
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Land Cruiser 19.0
2 │ SUV Q3 11.0
3 │ SUV Q5 8.0
4 │ SUV Kluger 8.0
This will give you an ordered view of acceleration by types.
P.s.: It’s always a good idea to wrap your code snippets in backticks ``
to make them more easily readable for others.
Hello Fbanning,
Thank you very much for the speedy response.
I have 2 concerns:
Would sorting be a computational expensive compare to just assigning ranks? My real data is actually rather large (as everyone here would have claimed )
I really need the ranking in a new column called “ranking” for example, does Julia not have RANK function similar to that of Excel?
Thanks again,
Nelson
Sukera
July 22, 2021, 12:08pm
4
Sorting and finding a sorted order is the same task - the difference being that in one case the real list is sorted and in the other only a list of its indices is sorted. You may be looking for sortperm
, though it sounds like you’re trying to use julia 1:1 like you would use Excel, which may not be the best way to think of julia.
Maybe you can tell us more about what you’re trying to achieve in the big picture so we can help you better?
1 Like
You can use StatsBase.jl ranking functions to generate proper ranking column. For example
using DataFrames
using StatsBase
df = DataFrame(Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"], models=["Q3","Q5", "Kluger", "Land Cruiser", "Corolla", "F40"], acceleration = [11,8, 8, 19, 5.5,3.3,])
sort!(df, :acceleration)
gdf = groupby(df, :Types)
transform(gdf, :acceleration => (x -> competerank(x, rev = true)) => :rank)
# 6×4 DataFrame
# Row │ Types models acceleration rank
# │ String String Float64 Int64
# ─────┼───────────────────────────────────────────
# 1 │ sedan F40 3.3 2
# 2 │ sedan Corolla 5.5 1
# 3 │ SUV Q5 8.0 3
# 4 │ SUV Kluger 8.0 3
# 5 │ SUV Q3 11.0 2
# 6 │ SUV Land Cruiser 19.0 1
I’ve used competerank
, but you can choose any other function, which suits your needs.
5 Likes
Sukera:
sortperm
HI Sukera,
I love this community, i asked a question, and i have multiple answers within 30 minutes.
Now you mentioned it, of course sorting and ranking would cost the same, silly me.
Yes thanks again!
Cheers
Nelson
Skoffer! Thank you, this is exactly what I need. it is in another package no wonder i couldnt find it in Help, perhaps i should learn to Google Better.
Thanks for your help. Marked as solution!
Cheers
Nelson
Skoffer:
sort!(df, :acceleration)
Maybe we could avoid this sorting step by writing:
gdf = groupby(df, :Types);
df[!,:rank] = vcat([competerank(g[!,:acceleration],rev=true) for g in gdf]...)
Wondering whether one could write this more simply and instead of using a comprehension, if it would be possible to broadcast competerank
acrosss the subdataframes of gdf
?
1 Like
Here is one more solution using DataFramesMeta.jl
julia> using DataFramesMeta, StatsBase;
julia> df = DataFrame(
Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"],
models=["Q3","Q5", "Kluger", "Land Cruiser", "Corolla", "F40"],
acceleration = [11,8, 8, 19, 5.5,3.3,]);
julia> ranked = @chain df begin
groupby(:Types)
@transform :rank = competerank(:acceleration; rev=true)
end
6×4 DataFrame
Row │ Types models acceleration rank
│ String String Float64 Int64
─────┼───────────────────────────────────────────
1 │ SUV Q3 11.0 2
2 │ SUV Q5 8.0 3
3 │ SUV Kluger 8.0 3
4 │ SUV Land Cruiser 19.0 1
5 │ sedan Corolla 5.5 1
6 │ sedan F40 3.3 2
3 Likes