pascal
1
This is probably very each but I just cannot figure it out. Let’s say I have a dataframe
df = DataFrame(x = rand(6), g = ["c", "c", "a", "b", "b", "b"])
6×2 DataFrame
Row │ x g
│ Float64 String
─────┼───────────────────
1 │ 0.0276903 c
2 │ 0.164615 c
3 │ 0.526818 a
4 │ 0.221024 b
5 │ 0.610872 b
6 │ 0.754117 b
Additionally to column g
I want to have a column that numbers the groups in g
, s.t.
6×3 DataFrame
Row │ x g gnum
│ Float64 String Int64
─────┼──────────────────────────
1 │ 0.424061 c 1
2 │ 0.851586 c 1
3 │ 0.537859 a 2
4 │ 0.694742 b 3
5 │ 0.0538259 b 3
6 │ 0.304558 b 3
How can I do this (preferably within a @chain
)?
Thanks
You want groupindices
julia> transform(gd, groupindices => :gnum)
6×3 DataFrame
Row │ x g gnum
│ Float64 String Int64
─────┼──────────────────────────
1 │ 0.0980152 c 1
2 │ 0.469093 c 1
3 │ 0.81644 a 2
4 │ 0.558021 b 3
5 │ 0.218029 b 3
6 │ 0.919571 b 3
1 Like
bkamins
3
where gd = groupby(df, :g)
pascal
4
Doing
df = DataFrame(x = rand(6), g = ["c", "c", "a", "b", "b", "b"])
gd = groupby(df, :g)
transform(gd, groupindices => :gnum)
results in the following error:
ERROR: ArgumentError: Unrecognized column selector: DataFrames.groupindices => :gnum
It works if I do
df.gnum = groupindices(groupby(df, :g))
Is there also a way to use Chain.jl
e.g.
df_new = @chain df begin
groupby(_, :g)
@transform :gnum = groupindices(_)
end
Thanks!
For using transform(gd, groupindices => :gnum)
you need to update DataFrames to its latest version. See here.
1 Like
pascal
6
Thanks @alfaromartino, it works after the update!
Two things
- You can use the functions with Chain.jl. Chain.jl and DataFramesMeta.jl macros are not tied together in any way.
julia> df_new = @chain df begin
groupby(_, :g)
transform(groupindices => :gnum)
end
6×3 DataFrame
Row │ x g gnum
│ Float64 String Int64
─────┼──────────────────────────
1 │ 0.465506 c 1
2 │ 0.0945491 c 1
3 │ 0.512721 a 2
4 │ 0.444936 b 3
5 │ 0.631455 b 3
6 │ 0.667982 b 3
But also, you can use it with @transform
as follows
julia> df_new = @chain df begin
groupby(_, :g)
@transform :gnum = $groupindices
end
6×3 DataFrame
Row │ x g gnum
│ Float64 String Int64
─────┼──────────────────────────
1 │ 0.465506 c 1
2 │ 0.0945491 c 1
3 │ 0.512721 a 2
4 │ 0.444936 b 3
5 │ 0.631455 b 3
6 │ 0.667982 b 3