DataFrame grouped by a column; How to access a group by a particular value in that column

The group keys are (named) tuples of the respective values (a one-tuple if you group by one column), so you can index into the GroupedDataFrame with such a one-tuple:

gdf = groupby(df, :city)

julia> gdf[("Berlin",)]
3×3 SubDataFrame
 Row │ city    date        v        
     │ String  String      Float64  
─────┼──────────────────────────────
   1 │ Berlin  10-16-2021  109.655
   2 │ Berlin  10-18-2021  100.57
   3 │ Berlin  10-19-2021   65.6764

If you like a get_group function better, where you can avoid the tuple syntax which is admittedly a bit ugly if there’s just one column, you could do it like this:

julia> get_group(gdf, keys...) = gdf[(keys...,)]
get_group (generic function with 1 method)

julia> get_group(gdf, "Berlin")
3×3 SubDataFrame
 Row │ city    date        v        
     │ String  String      Float64  
─────┼──────────────────────────────
   1 │ Berlin  10-16-2021  109.655
   2 │ Berlin  10-18-2021  100.57
   3 │ Berlin  10-19-2021   65.6764

Also, you get the keys with the keys function (same for normal dictionaries or any other key-based datastructure in Julia, usually)

julia> keys(gdf)
3-element DataFrames.GroupKeys{GroupedDataFrame{DataFrame}}:
 GroupKey: (city = "Paris",)
 GroupKey: (city = "London",)
 GroupKey: (city = "Berlin",)

You could collect that result if you need a vector.

4 Likes