Create variable name in dataframe that is groupby key

Suppose we have the dataframe:

df = DataFrame(Dict( :name => (“AAA”, “AAA”, “AAA”, “AAB”, “AAB”, “CCC”, “DDE”, “DDE”))

Each unique string in :name represents a group. We can use the split-apply-combine strategy described in Dataframes.jl to perform operations on other variables within each group.

I would also like to create a variable in the original dataframe equal to the keys of the groupby result.

There are less direct ways of doing this. For example, I could do

df.grp_number = (x → Dict(n → i for (n,i) in enumerate(unique(def.name)))).(df.name)

and then do split-apply-combine strategies separately. But this feels inelegant.

Does there exist a syntax like:

combine(grouby(df, :name), key => :grp_number)

?

First note that GroupDataFrame can be reordered and subsetted. Having said that there is no “simple” syntax to get group number, but your approach with unique is incorrect because of this.

The way to get group keys is to use keys function on GroupedDataFrame. It will give you a mapping from grup index to grouping variable value. I am not sure if this would work for you. I could help more if you explained how you want to use the group number later.

1 Like