Aggregate deprecated use combine

dear Julia Community,

I am new to Julia. Here is my problem using aggregate function.

using DataFrames
df6 = DataFrame(Group = rand(["A", "B", "C"], 15), Variable1 = randn(15), Variable2 = rand(15));
aggregate(df6, :Group, [mean, std]);

I have the following warning.

Warning: aggregate(d, cols, fs, sort=false, skipmissing=false) is deprecated. Instead use combine(groupby(d, cols, sort=false, skipmissing=false), [names(d, Not(cols)) .=> f for f in fs]...) if functions in fs have unique names.

Don’t quite understand what it means [names(d, Not(cols)) .=> f for f in fs]
Could someone show me how to rewrite this line with combine?

Thanks,
Jian

Here is a way where you explicitly list out the columns on which you want statistics

combine(groupby(df6,:Group),[:Variable1,:Variable2] .=> [mean,std])

@pdeffebach will come through with a more elegant solution in a few posts I’m sure.

1 Like

@tbeason thanks for the hint.

It returns mean(Var1) and std(Var2) groupby(group).

However, I expect mean(Var1),std(Var1),mean(Var2),std(Var2), groupby(group)

How to realize it?

Have a great day! :slight_smile:

Whoops sorry I guess I didn’t pay enough attention!

Here is one way to do both

combine(groupby(df6,:Group),([:Variable1,:Variable2] .=> f for f in (mean,std))...)

which is kind of shorthand for

combine(groupby(df6,:Group),[:Variable1,:Variable2] .=> mean,[:Variable1,:Variable2] .=> std)

@tbeason thanks a lot. It works as expected.

I knew where I got stuck. What do the trailing dots mean in this line?

The trailing dots are splatting. Say you have a function

fun(x, y, z)

but you only have your variables in a vector t = [1, 2, 3]. Then you can do

fun(t...)

I don’t think I have a better idea of how to do what you want than what is given in the warning message. It’s definitely wordier, but that’s the price we pay for reducing the surface of the API a bit.

2 Likes

Thanks Peter! It looks like equivalent to * unpacking a list in Python.