Rownumber() in Dataframe like SQL

AlexanderChen · November 26, 2021, 8:41am

Hi,

In SQL you have a function called rownumber() over which you partition a certain column and get a column from 1 …n. Like this:

dataframes has a function called rownumber but it only return an integer in specific cases. How do I do this is Dataframes

nilshg · November 26, 2021, 9:34am

df.row_num = 1:nrow(df)

AlexanderChen · November 26, 2021, 10:32am

Hi,

Can I still partition over this? I need 2 row counts for two different columns.

best,

nilshg · November 26, 2021, 11:11am

Sorry, I don’t understand the question - what does “partition over this” mean? How would the row count be different for two columns when they’re in the same DataFrame?

AlexanderChen · November 26, 2021, 11:34am

Hi @nilshg,

sorry then I did not explain it correctly.

I have the current df ordered on alphabetical order on column ‘NAMES’.
Once done I order the same df on ‘NAMES_NUMBER’ and create a separate column that gives me the rownumbers of that order bunch. So 1 name will have 2 different rownumbers. In SQL you can explicitly say one rownumber goes over one column and another rownumber goes over another column. You do that by partitioning it explicitly for both instanstances.

does that make sense?

best,

nilshg · November 26, 2021, 11:36am

Okay, if I understand correctly wouldn’t that just be:

sort!(df, :NAMES)
df.row_num_1= 1:nrow(df)
sort!(df, :NAMES_NUMBER)
df.row_num_2 = 1_nrow(df)

?

AlexanderChen · November 26, 2021, 1:04pm

yes, sometimes one gets stuck in tunnelvision :). Thanks!

jules · November 26, 2021, 2:04pm

~~Instead of sorting the whole dataframe just to record the row numbers of the sorted items, you could record the output of sortperm for both columns.~~

julia> df = DataFrame(a = rand(10), b = rand(10))
10×2 DataFrame
 Row │ a          b
     │ Float64    Float64
─────┼──────────────────────
   1 │ 0.710751   0.981133
   2 │ 0.407548   0.820994
   3 │ 0.560146   0.488446
   4 │ 0.851708   0.167793
   5 │ 0.0648273  0.309059
   6 │ 0.618235   0.818621
   7 │ 0.0239858  0.433564
   8 │ 0.741798   0.0706922
   9 │ 0.947348   0.719011
  10 │ 0.492506   0.430335

julia> transform(df, [:a, :b] .=> sortperm .=> [:rownumber_a, :rownumber_b])
10×4 DataFrame
 Row │ a          b          rownumber_a  rownumber_b
     │ Float64    Float64    Int64        Int64
─────┼────────────────────────────────────────────────
   1 │ 0.710751   0.981133             7            8
   2 │ 0.407548   0.820994             5            4
   3 │ 0.560146   0.488446             2            5
   4 │ 0.851708   0.167793            10           10
   5 │ 0.0648273  0.309059             3            7
   6 │ 0.618235   0.818621             6            3
   7 │ 0.0239858  0.433564             1            9
   8 │ 0.741798   0.0706922            8            6
   9 │ 0.947348   0.719011             4            2
  10 │ 0.492506   0.430335             9            1

Edit: Wait that’s not entirely correct, but it’s close in terms of the underlying idea. This just has the number you’d have to index at to receive the sorted vector…

rafael.guerra · November 26, 2021, 5:17pm

Trying to follow @jules here.
N is the index for Names and NN for NamesNum:

using DataFrames
Names = ["Beth", "Carl", "Ana", "Dan"]
NamesNum = [11, 5, 2, 7]
n = nrow(df)
df = DataFrame(Names=Names, NamesNum=NamesNum)

df.N, df.NN = sortperm.([df.Names, df.NamesNum])
df.N[df.N] .= 1:n
df.NN[df.NN] .= 1:n
df

 Row │ Names   NamesNum  N      NN    
     │ String  Int64     Int64  Int64 
─────┼────────────────────────────────
   1 │ Beth          11      2      4
   2 │ Carl           5      3      2
   3 │ Ana            2      1      1
   4 │ Dan            7      4      3

eotero · August 28, 2023, 4:21pm

Bogumil recommends the built-in function “eachindex” for:
combine(groupby(df,field) , eachindex)

“RowNumber by Partition” function · Issue #3374 · JuliaData/DataFrames.jl (github.com)

Hope this helps.

Topic		Replies	Views
Sorting a Table / DataFrame General Usage dataframes	12	806	September 6, 2023
Row index in a dataframe General Usage question , dataframes	4	1587	October 23, 2021
DataFrameRow Row Number New to Julia	5	3769	April 21, 2021
Mutate a new variable with row numbers Data	4	1385	November 12, 2019
Counting in dataframes Data dataframes	7	1582	June 7, 2023

Rownumber() in Dataframe like SQL

Related topics