Ranking of Dataframe?

Hello,

This is literally my second question and day 3 of Julia, Previously i was a VBA guy with some Python, C++ and Matlab experience.

I have already searched both Google, Forum and Help file, but I don’t seem to be able to find a ranking function in Julia Dataframe.

Suppose that i have a dataframe as such:

Pkg.add(“DataFrames”)
using DataFrames
df = DataFrame(Types = [“SUV”, “SUV”,“SUV”, “SUV”, “sedan”,“sedan”], models=[“Q3”,“Q5”, “Kluger”, “Land Cruiser”, “Corolla”, “F40”], acceleration = [11,8, 8, 19, 5.5,3.3,])

6×3 DataFrame
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Q3 11.0
2 │ SUV Q5 8.0
3 │ SUV Kluger 8.0
4 │ SUV Land Cruiser 19.0
5 │ sedan Corolla 5.5
6 │ sedan F40 3.3

How do I add a column and rank the acceleration by “Types”?

Cheers,
Nelson

You could for example do:

# create df
julia> df = DataFrame(Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"], models=["Q3","Q5", "Kluger", "Land Cruiser", "Coroll
a", "F40"], acceleration = [11,8, 8, 19, 5.5,3.3,])
6×3 DataFrame
 Row │ Types   models        acceleration
     │ String  String        Float64
─────┼────────────────────────────────────
   1 │ SUV     Q3                    11.0
   2 │ SUV     Q5                     8.0
   3 │ SUV     Kluger                 8.0
   4 │ SUV     Land Cruiser          19.0
   5 │ sedan   Corolla                5.5
   6 │ sedan   F40                    3.3

# group df by types
julia> gdf = groupby(df, "Types")
GroupedDataFrame with 2 groups based on key: Types
First Group (4 rows): Types = "SUV"
 Row │ Types   models        acceleration
     │ String  String        Float64
─────┼────────────────────────────────────
   1 │ SUV     Q3                    11.0
   2 │ SUV     Q5                     8.0
   3 │ SUV     Kluger                 8.0
   4 │ SUV     Land Cruiser          19.0
⋮
Last Group (2 rows): Types = "sedan"
 Row │ Types   models   acceleration
     │ String  String   Float64
─────┼───────────────────────────────
   1 │ sedan   Corolla           5.5
   2 │ sedan   F40               3.3

# sort subdataframes by acceleration in descending order
julia> sort(gdf[1], "acceleration", rev = true)
4×3 DataFrame
 Row │ Types   models        acceleration
     │ String  String        Float64
─────┼────────────────────────────────────
   1 │ SUV     Land Cruiser          19.0
   2 │ SUV     Q3                    11.0
   3 │ SUV     Q5                     8.0
   4 │ SUV     Kluger                 8.0

This will give you an ordered view of acceleration by types.

P.s.: It’s always a good idea to wrap your code snippets in backticks `` to make them more easily readable for others.

Hello Fbanning,

Thank you very much for the speedy response.

I have 2 concerns:

  1. Would sorting be a computational expensive compare to just assigning ranks? My real data is actually rather large (as everyone here would have claimed :slight_smile: )
  2. I really need the ranking in a new column called “ranking” for example, does Julia not have RANK function similar to that of Excel?

Thanks again,
Nelson

Sorting and finding a sorted order is the same task - the difference being that in one case the real list is sorted and in the other only a list of its indices is sorted. You may be looking for sortperm, though it sounds like you’re trying to use julia 1:1 like you would use Excel, which may not be the best way to think of julia.

Maybe you can tell us more about what you’re trying to achieve in the big picture so we can help you better?

1 Like

You can use StatsBase.jl ranking functions to generate proper ranking column. For example

using DataFrames
using StatsBase

df = DataFrame(Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"], models=["Q3","Q5", "Kluger", "Land Cruiser", "Corolla", "F40"], acceleration = [11,8, 8, 19, 5.5,3.3,])

sort!(df, :acceleration)
gdf = groupby(df, :Types)
transform(gdf, :acceleration => (x -> competerank(x, rev = true)) => :rank)

# 6×4 DataFrame
#  Row │ Types   models        acceleration  rank
#      │ String  String        Float64       Int64
# ─────┼───────────────────────────────────────────
#    1 │ sedan   F40                    3.3      2
#    2 │ sedan   Corolla                5.5      1
#    3 │ SUV     Q5                     8.0      3
#    4 │ SUV     Kluger                 8.0      3
#    5 │ SUV     Q3                    11.0      2
#    6 │ SUV     Land Cruiser          19.0      1

I’ve used competerank, but you can choose any other function, which suits your needs.

5 Likes

HI Sukera,

I love this community, i asked a question, and i have multiple answers within 30 minutes.

Now you mentioned it, of course sorting and ranking would cost the same, silly me.

Yes thanks again!

Cheers
Nelson

Skoffer! Thank you, this is exactly what I need. it is in another package no wonder i couldnt find it in Help, perhaps i should learn to Google Better.

Thanks for your help. Marked as solution!

Cheers
Nelson

Maybe we could avoid this sorting step by writing:

gdf = groupby(df, :Types);
df[!,:rank] = vcat([competerank(g[!,:acceleration],rev=true) for g in gdf]...)

Wondering whether one could write this more simply and instead of using a comprehension, if it would be possible to broadcast competerank acrosss the subdataframes of gdf?

1 Like

Here is one more solution using DataFramesMeta.jl

julia> using DataFramesMeta, StatsBase;
julia> df = DataFrame(
    Types = ["SUV", "SUV","SUV", "SUV", "sedan","sedan"],
    models=["Q3","Q5", "Kluger", "Land Cruiser", "Corolla", "F40"], 
    acceleration = [11,8, 8, 19, 5.5,3.3,]);

julia> ranked = @chain df begin            
    groupby(:Types)
    @transform :rank = competerank(:acceleration; rev=true)
end
6×4 DataFrame
 Row │ Types   models        acceleration  rank  
     │ String  String        Float64       Int64 
─────┼───────────────────────────────────────────
   1 │ SUV     Q3                    11.0      2
   2 │ SUV     Q5                     8.0      3
   3 │ SUV     Kluger                 8.0      3
   4 │ SUV     Land Cruiser          19.0      1
   5 │ sedan   Corolla                5.5      1
   6 │ sedan   F40                    3.3      2
3 Likes