Hi,
In SQL you have a function called rownumber() over which you partition a certain column and get a column from 1 β¦n. Like this:
dataframes has a function called rownumber but it only return an integer in specific cases. How do I do this is Dataframes
Hi,
Can I still partition over this? I need 2 row counts for two different columns.
best,
nilshg
November 26, 2021, 11:11am
4
Sorry, I donβt understand the question - what does βpartition over thisβ mean? How would the row count be different for two columns when theyβre in the same DataFrame?
1 Like
Hi @nilshg ,
sorry then I did not explain it correctly.
I have the current df ordered on alphabetical order on column βNAMESβ.
Once done I order the same df on βNAMES_NUMBERβ and create a separate column that gives me the rownumbers of that order bunch. So 1 name will have 2 different rownumbers. In SQL you can explicitly say one rownumber goes over one column and another rownumber goes over another column. You do that by partitioning it explicitly for both instanstances.
does that make sense?
best,
nilshg
November 26, 2021, 11:36am
6
Okay, if I understand correctly wouldnβt that just be:
sort!(df, :NAMES)
df.row_num_1= 1:nrow(df)
sort!(df, :NAMES_NUMBER)
df.row_num_2 = 1_nrow(df)
?
yes, sometimes one gets stuck in tunnelvision :). Thanks!
1 Like
jules
November 26, 2021, 2:04pm
8
Instead of sorting the whole dataframe just to record the row numbers of the sorted items, you could record the output of sortperm
for both columns.
julia> df = DataFrame(a = rand(10), b = rand(10))
10Γ2 DataFrame
Row β a b
β Float64 Float64
ββββββΌββββββββββββββββββββββ
1 β 0.710751 0.981133
2 β 0.407548 0.820994
3 β 0.560146 0.488446
4 β 0.851708 0.167793
5 β 0.0648273 0.309059
6 β 0.618235 0.818621
7 β 0.0239858 0.433564
8 β 0.741798 0.0706922
9 β 0.947348 0.719011
10 β 0.492506 0.430335
julia> transform(df, [:a, :b] .=> sortperm .=> [:rownumber_a, :rownumber_b])
10Γ4 DataFrame
Row β a b rownumber_a rownumber_b
β Float64 Float64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.710751 0.981133 7 8
2 β 0.407548 0.820994 5 4
3 β 0.560146 0.488446 2 5
4 β 0.851708 0.167793 10 10
5 β 0.0648273 0.309059 3 7
6 β 0.618235 0.818621 6 3
7 β 0.0239858 0.433564 1 9
8 β 0.741798 0.0706922 8 6
9 β 0.947348 0.719011 4 2
10 β 0.492506 0.430335 9 1
Edit: Wait thatβs not entirely correct, but itβs close in terms of the underlying idea. This just has the number youβd have to index at to receive the sorted vectorβ¦
1 Like
Trying to follow @jules here.
N
is the index for Names
and NN
for NamesNum
:
using DataFrames
Names = ["Beth", "Carl", "Ana", "Dan"]
NamesNum = [11, 5, 2, 7]
n = nrow(df)
df = DataFrame(Names=Names, NamesNum=NamesNum)
df.N, df.NN = sortperm.([df.Names, df.NamesNum])
df.N[df.N] .= 1:n
df.NN[df.NN] .= 1:n
df
Row β Names NamesNum N NN
β String Int64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββββββ
1 β Beth 11 2 4
2 β Carl 5 3 2
3 β Ana 2 1 1
4 β Dan 7 4 3
1 Like
eotero
August 28, 2023, 4:21pm
10
Bogumil recommends the built-in function βeachindexβ for:
combine(groupby(df,field) , eachindex)
βRowNumber by Partitionβ function Β· Issue #3374 Β· JuliaData/DataFrames.jl (github.com)
Hope this helps.