Iterating over (key, value) with grouped DataFrame

Not sure if I’m being dense here, but is there a simple way to iterate over a GroupedDataFrame, having the grouping levels available at each iteration? i.e. something like

df = DataFrame(a = [1,1,2,2], b = randn(4))
gdf = groupby(df, :a)

for (keys, subdf) in iterator_im_looking_for(gdf)
    println((keys, subdf))
end
# Desired output: (NamedTuple, SubDataFrame) pairs, or something similar
#
#((a=1,), 2×2 SubDataFrame
#│ Row │ a     │ b        │
#│     │ Int64 │ Float64  │
#├─────┼───────┼──────────┤
#│ 1   │ 1     │ 0.109089 │
#│ 2   │ 1     │ 0.107033 │)
#((a=2,), 2×2 SubDataFrame
#│ Row │ a     │ b        │
#│     │ Int64 │ Float64  │
#├─────┼───────┼──────────┤
#│ 1   │ 2     │ 1.29613  │
#│ 2   │ 2     │ -2.33027 │)

Is this what you are looking for? https://github.com/JuliaData/DataFrames.jl/pull/1908.

Until that is released you can get grouping variables as a data frame using eg. collect(x -> first(x)[groupvars(parent(x))], gdf)

Ha, that’s exactly what I’m looking for. Thanks!

This code doesn’t work, but this does:

[parent(gdf)[i, groupvars(gdf)] for i in gdf.starts]

Sorry, I was writing from my head and mixed up collect with combine. This is what works as an example:

select!(combine(first, gdf), groupvars(gdf))

The solution from the linked PR using gdf.starts works, but it is using an internal, undocumented starts field that is not guaranteed to be supported in the future.

1 Like