Rownumber() in Dataframe like SQL

Hi,

In SQL you have a function called rownumber() over which you partition a certain column and get a column from 1 …n. Like this:

image

dataframes has a function called rownumber but it only return an integer in specific cases. How do I do this is Dataframes

df.row_num = 1:nrow(df)
1 Like

Hi,

Can I still partition over this? I need 2 row counts for two different columns.

best,

Sorry, I don’t understand the question - what does β€œpartition over this” mean? How would the row count be different for two columns when they’re in the same DataFrame?

Hi @nilshg,

sorry then I did not explain it correctly.

I have the current df ordered on alphabetical order on column β€˜NAMES’.
Once done I order the same df on β€˜NAMES_NUMBER’ and create a separate column that gives me the rownumbers of that order bunch. So 1 name will have 2 different rownumbers. In SQL you can explicitly say one rownumber goes over one column and another rownumber goes over another column. You do that by partitioning it explicitly for both instanstances.

does that make sense?

best,

Okay, if I understand correctly wouldn’t that just be:

sort!(df, :NAMES)
df.row_num_1= 1:nrow(df)
sort!(df, :NAMES_NUMBER)
df.row_num_2 = 1_nrow(df)

?

yes, sometimes one gets stuck in tunnelvision :). Thanks!

1 Like

Instead of sorting the whole dataframe just to record the row numbers of the sorted items, you could record the output of sortperm for both columns.

julia> df = DataFrame(a = rand(10), b = rand(10))
10Γ—2 DataFrame
 Row β”‚ a          b
     β”‚ Float64    Float64
─────┼──────────────────────
   1 β”‚ 0.710751   0.981133
   2 β”‚ 0.407548   0.820994
   3 β”‚ 0.560146   0.488446
   4 β”‚ 0.851708   0.167793
   5 β”‚ 0.0648273  0.309059
   6 β”‚ 0.618235   0.818621
   7 β”‚ 0.0239858  0.433564
   8 β”‚ 0.741798   0.0706922
   9 β”‚ 0.947348   0.719011
  10 β”‚ 0.492506   0.430335

julia> transform(df, [:a, :b] .=> sortperm .=> [:rownumber_a, :rownumber_b])
10Γ—4 DataFrame
 Row β”‚ a          b          rownumber_a  rownumber_b
     β”‚ Float64    Float64    Int64        Int64
─────┼────────────────────────────────────────────────
   1 β”‚ 0.710751   0.981133             7            8
   2 β”‚ 0.407548   0.820994             5            4
   3 β”‚ 0.560146   0.488446             2            5
   4 β”‚ 0.851708   0.167793            10           10
   5 β”‚ 0.0648273  0.309059             3            7
   6 β”‚ 0.618235   0.818621             6            3
   7 β”‚ 0.0239858  0.433564             1            9
   8 β”‚ 0.741798   0.0706922            8            6
   9 β”‚ 0.947348   0.719011             4            2
  10 β”‚ 0.492506   0.430335             9            1

Edit: Wait that’s not entirely correct, but it’s close in terms of the underlying idea. This just has the number you’d have to index at to receive the sorted vector…

1 Like

Trying to follow @jules here.
N is the index for Names and NN for NamesNum:

using DataFrames
Names = ["Beth", "Carl", "Ana", "Dan"]
NamesNum = [11, 5, 2, 7]
n = nrow(df)
df = DataFrame(Names=Names, NamesNum=NamesNum)

df.N, df.NN = sortperm.([df.Names, df.NamesNum])
df.N[df.N] .= 1:n
df.NN[df.NN] .= 1:n
df

 Row β”‚ Names   NamesNum  N      NN    
     β”‚ String  Int64     Int64  Int64 
─────┼────────────────────────────────
   1 β”‚ Beth          11      2      4
   2 β”‚ Carl           5      3      2
   3 β”‚ Ana            2      1      1
   4 β”‚ Dan            7      4      3
1 Like