DataFrame grouped by a column; How to access a group by a particular value in that column

It’s the first time when I’m working with groupby a DataFrame. Let us say I have a DataFrame, grouped by city name:

df=DataFrame(city=["Paris", "London", "Paris", "Berlin", "London", "Berlin", "Berlin"],  
             date= ["10-1$k-2021"  for k in 3:9],
             v=38 .+100*rand(7))
gb =groupby(df, :city)

Is there a better method to get the list of cities associated to groups (i.e. the equivalent of pandas gb.groups.keys()?
I retrieved them as follows:

city_name = String[]
for gr in gb 
   push!(city_name, gr[1, :city])
end

How could I access a group by its city name, not by its index? Is there a method like this pandas one:

gb.get_group("Berlin")

The group keys are (named) tuples of the respective values (a one-tuple if you group by one column), so you can index into the GroupedDataFrame with such a one-tuple:

gdf = groupby(df, :city)

julia> gdf[("Berlin",)]
3×3 SubDataFrame
 Row │ city    date        v        
     │ String  String      Float64  
─────┼──────────────────────────────
   1 │ Berlin  10-16-2021  109.655
   2 │ Berlin  10-18-2021  100.57
   3 │ Berlin  10-19-2021   65.6764

If you like a get_group function better, where you can avoid the tuple syntax which is admittedly a bit ugly if there’s just one column, you could do it like this:

julia> get_group(gdf, keys...) = gdf[(keys...,)]
get_group (generic function with 1 method)

julia> get_group(gdf, "Berlin")
3×3 SubDataFrame
 Row │ city    date        v        
     │ String  String      Float64  
─────┼──────────────────────────────
   1 │ Berlin  10-16-2021  109.655
   2 │ Berlin  10-18-2021  100.57
   3 │ Berlin  10-19-2021   65.6764

Also, you get the keys with the keys function (same for normal dictionaries or any other key-based datastructure in Julia, usually)

julia> keys(gdf)
3-element DataFrames.GroupKeys{GroupedDataFrame{DataFrame}}:
 GroupKey: (city = "Paris",)
 GroupKey: (city = "London",)
 GroupKey: (city = "Berlin",)

You could collect that result if you need a vector.

4 Likes