DataFrames: how to calculate conditional mean values in grouped dataframe?

Dear Community,
I have a DataFrame df, which I group for instance like:

foo = by(df, [:param1, :param2], :res => mean)

Here, columns param1 and param2 store parameter values and column res stores result values. Foo shows the mean result values for each parameter combination.

I also have another column :flag which stores exit flags (integers) of each result. How can the example above be adapted to consider only values in which certain exit flags are set? e.g., something like:

foo = by(df, [:param1, :param2], :res => mean if(:flag .= 1 || :flag.=2) )

Thank you in advance.

On the phone so can’t check but I think

by(df[df.flag.==1, :], [:param1, :param2], :res=>mean)

Should do?

2 Likes

Thank you, this works!
Unlikely I forgot to mention the following: suppose there is a parameter combination in which there is no df.flag .<=2. These combinations are omitted in the resulting dataframe. Is there a way to force them to be listed (for instance with NaN values)?

Not currently. We could add an argument to enable that (e.g. droplevels=false), but note that in any case that would only work if df.flag is a CategoricalArray, since otherwise there’s no way by can know what are the possible values.

1 Like

Okay. Thank you anyway for your fast response :slight_smile:

Create a new column that takes a value 1 when either flag == 1 or flag == 2 and 0 otherwise. Then also use this as a grouping column in the by command.

by(df, [:param1, :param2, :newcol], :res=>mean)

It will also compute the mean for values with flag > 2, you can just reset them to NaN if you want.

3 Likes

This is a nice solution. Thank you!!