Recommended equivalent to map / broadcast on GroupedDataFrame

If we have function f1 that takes a DataFrame as an argument and returns a scalar, how can a reduction operation using f1 be applied over each SubDataFrame of a grouped dataframe?

using DataFrames

# reduce function
f1(df::AbstractDataFrame) = sum(df[!, :x1])

# data with `y` as key
df1 = DataFrame(rand(100, 3), :auto)
df1.y = rand(1:3, 100)
df1g = groupby(df1, :y)

An ideal scenario would be to be able to directly pass SubDataFrame as input to the the aggregating function in combine, something like:

combine(df1g, AsDF() => f1 => :z)

The following works, but is an inefficient workaround:

combine(df1g, AsTable(:) => (x -> f1(DataFrame(x))) => :z)

Otherwise, the map and broadcasting seemed natural options but they are reserved operations:

julia> map(f1, df1g)
ERROR: ArgumentError: using map over `GroupedDataFrame`s is reserved

julia> f1.(df1g)
ERROR: ArgumentError: broadcasting over `GroupedDataFrame`s is reserved

Going through a loop also appear the efficient option, but doesn’t appear as an “elegant” solution:

out = Pair[]
for idx in eachindex(df1g)
    push!(out, idx => f1(df1g[idx]))

Is there a more straighforward way to performed the desired reduction over grouped dataframe that I missed?

If I understood the requirement correctly, you could try the following form


if you want to rename the new column, you could use a function like this

f2(df::AbstractDataFrame) = (;z=sum(df[!, :x1]))

Yes - combine is the intended method if you want a data frame in the output.
If you want a vector then do:

[f(sdf) for sdf in gdf]

This ambiguity (the type of the resulting object) is the reason why map is not implemented for GroupedDataFrame.

Thank you! I had completely overlooked the application of the combine(fun, df) method as I’ve integrated these verbs to be strictly of the form combine(df, ops...). Much appreciated!


