Not sure if I’m being dense here, but is there a simple way to iterate over a GroupedDataFrame, having the grouping levels available at each iteration? i.e. something like
df = DataFrame(a = [1,1,2,2], b = randn(4))
gdf = groupby(df, :a)
for (keys, subdf) in iterator_im_looking_for(gdf)
println((keys, subdf))
end
# Desired output: (NamedTuple, SubDataFrame) pairs, or something similar
#
#((a=1,), 2×2 SubDataFrame
#│ Row │ a │ b │
#│ │ Int64 │ Float64 │
#├─────┼───────┼──────────┤
#│ 1 │ 1 │ 0.109089 │
#│ 2 │ 1 │ 0.107033 │)
#((a=2,), 2×2 SubDataFrame
#│ Row │ a │ b │
#│ │ Int64 │ Float64 │
#├─────┼───────┼──────────┤
#│ 1 │ 2 │ 1.29613 │
#│ 2 │ 2 │ -2.33027 │)
Sorry, I was writing from my head and mixed up collect with combine. This is what works as an example:
select!(combine(first, gdf), groupvars(gdf))
The solution from the linked PR using gdf.starts works, but it is using an internal, undocumented starts field that is not guaranteed to be supported in the future.