If we have function f1
that takes a DataFrame as an argument and returns a scalar, how can a reduction operation using f1
be applied over each SubDataFrame
of a grouped dataframe?
using DataFrames
# reduce function
f1(df::AbstractDataFrame) = sum(df[!, :x1])
# data with `y` as key
df1 = DataFrame(rand(100, 3), :auto)
df1.y = rand(1:3, 100)
df1g = groupby(df1, :y)
An ideal scenario would be to be able to directly pass SubDataFrame
as input to the the aggregating function in combine
, something like:
combine(df1g, AsDF() => f1 => :z)
The following works, but is an inefficient workaround:
combine(df1g, AsTable(:) => (x -> f1(DataFrame(x))) => :z)
Otherwise, the map and broadcasting seemed natural options but they are reserved operations:
julia> map(f1, df1g)
ERROR: ArgumentError: using map over `GroupedDataFrame`s is reserved
julia> f1.(df1g)
ERROR: ArgumentError: broadcasting over `GroupedDataFrame`s is reserved
Stacktrace:
Going through a loop also appear the efficient option, but doesn’t appear as an “elegant” solution:
out = Pair[]
for idx in eachindex(df1g)
push!(out, idx => f1(df1g[idx]))
end
Is there a more straighforward way to performed the desired reduction over grouped dataframe that I missed?