Number groups in dataframe

This is probably very each but I just cannot figure it out. Let’s say I have a dataframe

df = DataFrame(x = rand(6), g = ["c", "c", "a", "b", "b", "b"])
6×2 DataFrame
 Row │ x          g      
     │ Float64    String 
─────┼───────────────────
   1 │ 0.0276903  c
   2 │ 0.164615   c
   3 │ 0.526818   a
   4 │ 0.221024   b
   5 │ 0.610872   b
   6 │ 0.754117   b

Additionally to column g I want to have a column that numbers the groups in g, s.t.

6×3 DataFrame
 Row │ x          g       gnum  
     │ Float64    String  Int64 
─────┼──────────────────────────
   1 │ 0.424061   c           1
   2 │ 0.851586   c           1
   3 │ 0.537859   a           2
   4 │ 0.694742   b           3
   5 │ 0.0538259  b           3
   6 │ 0.304558   b           3

How can I do this (preferably within a @chain)?

Thanks

You want groupindices

julia> transform(gd, groupindices => :gnum)
6×3 DataFrame
 Row │ x          g       gnum  
     │ Float64    String  Int64 
─────┼──────────────────────────
   1 │ 0.0980152  c           1
   2 │ 0.469093   c           1
   3 │ 0.81644    a           2
   4 │ 0.558021   b           3
   5 │ 0.218029   b           3
   6 │ 0.919571   b           3
1 Like

where gd = groupby(df, :g)

Doing

df = DataFrame(x = rand(6), g = ["c", "c", "a", "b", "b", "b"])
gd = groupby(df, :g)
transform(gd, groupindices => :gnum)

results in the following error:

ERROR: ArgumentError: Unrecognized column selector: DataFrames.groupindices => :gnum

It works if I do

df.gnum = groupindices(groupby(df, :g))

Is there also a way to use Chain.jl e.g.

df_new = @chain df begin
    groupby(_, :g)
    @transform :gnum = groupindices(_)
end

Thanks!

For using transform(gd, groupindices => :gnum) you need to update DataFrames to its latest version. See here.

1 Like

Thanks @alfaromartino, it works after the update!

Two things

  1. You can use the functions with Chain.jl. Chain.jl and DataFramesMeta.jl macros are not tied together in any way.
julia> df_new = @chain df begin
           groupby(_, :g)
           transform(groupindices => :gnum)
       end
6×3 DataFrame
 Row │ x          g       gnum  
     │ Float64    String  Int64 
─────┼──────────────────────────
   1 │ 0.465506   c           1
   2 │ 0.0945491  c           1
   3 │ 0.512721   a           2
   4 │ 0.444936   b           3
   5 │ 0.631455   b           3
   6 │ 0.667982   b           3

But also, you can use it with @transform as follows

julia> df_new = @chain df begin
           groupby(_, :g)
           @transform :gnum = $groupindices
       end
6×3 DataFrame
 Row │ x          g       gnum  
     │ Float64    String  Int64 
─────┼──────────────────────────
   1 │ 0.465506   c           1
   2 │ 0.0945491  c           1
   3 │ 0.512721   a           2
   4 │ 0.444936   b           3
   5 │ 0.631455   b           3
   6 │ 0.667982   b           3

Thanks a lot @pdeffebach