How to dispatch a function aggregating columns in a grouped DataFrame?

I want to aggregate the columns of a DataFrame based on their types, i.e something like:

function aggregate(v::Vector{T}) where T <: Number
    sum(v)
end

function aggregate(v::Vector{String})
    [unique(v)]
end

df = DataFrame(idx = [1, 2, 3], a = [10, 20, 30], b = ["text1", "text2", "text3"]);
test1 = combine(df, names(df) .=> aggregate)
test2 = combine(groupby(df, :idx), names(df) .=> aggregate)

What is the best way to make this work for the grouped DataFrame? The columns don’t seem to be simple vectors after applying groupby, but some kind of SubArray. What’s the cleanest way to make custom functions work for those types?

Indeed it gets called with a SubArray. I don’t think that’s documented, but it’s a bit of an implementation detail… You have for example:

julia> SubArray{String, 1, Vector{String}, Tuple{Vector{Int64}}, false} <: AbstractVector{String}
true

so you can define:

function agg(v::AbstractVector{<:Number})
   sum(v)
end

function agg(v::AbstractVector{String})
   [unique(v)]
end

This works:

using DataFrames

df = DataFrame(idx = [1, 2, 3], a = [10, 20, 30], b = ["text1", "text2", "text3"]);

julia> test1 = combine(df, names(df) .=> agg)
1Γ—3 DataFrame
 Row β”‚ idx_agg  a_agg  b_agg                       
     β”‚ Int64    Int64  Array…                      
─────┼─────────────────────────────────────────────
   1 β”‚       6     60  ["text1", "text2", "text3"]

julia> test2 = combine(groupby(df, :idx), names(df) .=> agg)
3Γ—4 DataFrame
 Row β”‚ idx    idx_agg  a_agg  b_agg     
     β”‚ Int64  Int64    Int64  Array…    
─────┼──────────────────────────────────
   1 β”‚     1        1     10  ["text1"]
   2 β”‚     2        2     20  ["text2"]
   3 β”‚     3        3     30  ["text3"]

although in typical usage you can do directly:

julia> test2 = combine(groupby(df, :idx), names(df, Number) .=> sum, names(df, String) .=> Ref∘unique)
3Γ—4 DataFrame
 Row β”‚ idx    idx_sum  a_sum  b_Ref_unique 
     β”‚ Int64  Int64    Int64  Array…       
─────┼─────────────────────────────────────
   1 β”‚     1        1     10  ["text1"]
   2 β”‚     2        2     20  ["text2"]
   3 β”‚     3        3     30  ["text3"]
2 Likes