Using DataFrames `combine` is there a way to programmatically pass multiple functions to apply to the same same column?

Is there a way to pass a Vector{Function} like this:
methods = [mean, max]
to DataFrames combine? I was thinking something like this?
df = combine(groupdf, nrow, 3 => methods)
of
df = combine(groupdf, nrow, 3 => methods...)

try broadcasting over the functions with .=> maybe?

2 Likes

brilliant, so intuitive, thanks… I love Julia :wink:

any suggestions for programmatically applying multiple methods to multiple columns?

for

methods = [mean, maximum]

and

cols = [:x, :y]

df = combine(groupdf, cols .=> methods) gives results of x_mean and y_maximum

df = combine(groupdf, cols[1] .=> methods) gives results of x_mean and x_maximum

df = combine(groupdf, cols[2] .=> methods) gives results of y_mean and y_maximum

but I can’t seem to find a solution that gives x_mean , x_maximum, y_mean , y_maximum

the third solution here worked:

 combine(gdf, nrow, [n => f for n in cols for f in methods])

make a poor man’s cartesian product

julia> df = DataFrame([:a => [1,2,3], :b => [4,5,6]])
3Γ—2 DataFrame
 Row β”‚ a      b     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      4
   2 β”‚     2      5
   3 β”‚     3      6

julia> select(df, repeat([:a, :b], inner=2) .=> repeat([maximum, minimum], outer=2))
3Γ—4 DataFrame
 Row β”‚ a_maximum  a_minimum  b_maximum  b_minimum 
     β”‚ Int64      Int64      Int64      Int64     
─────┼────────────────────────────────────────────
   1 β”‚         3          1          6          4
   2 β”‚         3          1          6          4
   3 β”‚         3          1          6          4

the generator you found looks good too!

The simplest way is to change methods = [mean, maximum] for methods = [mean maximum] (no comma). Then, if you run combine(groupdf, cols .=> methods) you get x_mean , x_maximum , y_mean , y_maximum.

3 Likes

This is fantastic… are you able to explain why a Vector of functions needs to be iterated over while a Matrix of functions does not?

You can see more about it here:

Just in case, that blog is written by @bkamins, who is one of the developers of DataFrames. It’s really helpful.

2 Likes

See:

julia> ["r1", "r2", "r3"] .=> ["c1" "c2"]
3Γ—2 Matrix{Pair{String, String}}:
 "r1"=>"c1"  "r1"=>"c2"
 "r2"=>"c1"  "r2"=>"c2"
 "r3"=>"c1"  "r3"=>"c2"
1 Like