Is there a reason for groupby not supporting a GroupedDataFrame as an input?
In a use case, if I try to apply a transform base on different groups, it will hit an error:
dfg = groupby(df, "eid")
transform!(dfg, "amount" => ((x) -> cumsum(x)) => "amount_cumsum")
dfg = groupby(dfg, "date")
ERROR: MethodError: no method matching groupby(::GroupedDataFrame{DataFrame}, ::Array{String,1})
Closest candidates are:
groupby(::AbstractDataFrame, ::Any; sort, skipmissing) at C:\Users\jerem\.julia\packages\DataFrames\oQ5c7\src\groupeddataframe\groupeddataframe.jl:187
A solution could be to use the non-mutating transform along ungroup=true, but on large data, it is a very costly operation (both time and RAM):
dfg = groupby(df, "eid")
df1 = transform(dfg, "amount" => ((x) -> cumsum(x)) => "amount_cumsum", ungroup=true)
dfg = groupby(df1, "date")
A way to circumvent this seems to be to call the second groupby on the parent of the GroupedDataFrame:
dfg = groupby(df, "eid")
transform!(dfg, "amount" => ((x) -> cumsum(x)) => "amount_cumsum")
dfg = groupby(dfg.parent, "date")
However, I have doubts whether this later approach might be exposed to some undesired side effects. If itโs legit, then wouldnโt it be desirable for groupby the handle a GroupedDataFrame as in input?
I might have some anchoring with regard to the behavior of Rโs data.table, where a transformation sequentially be called on different grouping through dt[, ..., by = "eid"] and dt[, ..., by = "date"]. In DataFrames.jl, can it be assumed that to remove the grouping key, doing df = dfg.parent would be equivalent to data.tableโs setkey(dt, NULL)?