DataFrames: how to calculate conditional mean values in grouped dataframe?

mike_k · April 16, 2019, 8:45am

Dear Community,
I have a DataFrame df, which I group for instance like:

foo = by(df, [:param1, :param2], :res => mean)

Here, columns param1 and param2 store parameter values and column res stores result values. Foo shows the mean result values for each parameter combination.

I also have another column :flag which stores exit flags (integers) of each result. How can the example above be adapted to consider only values in which certain exit flags are set? e.g., something like:

foo = by(df, [:param1, :param2], :res => mean if(:flag .= 1 || :flag.=2) )

Thank you in advance.

nilshg · April 16, 2019, 10:47am

On the phone so can’t check but I think

by(df[df.flag.==1, :], [:param1, :param2], :res=>mean)

Should do?

mike_k · April 16, 2019, 11:01am

Thank you, this works!
Unlikely I forgot to mention the following: suppose there is a parameter combination in which there is no df.flag .<=2. These combinations are omitted in the resulting dataframe. Is there a way to force them to be listed (for instance with NaN values)?

nalimilan · April 16, 2019, 11:51am

Not currently. We could add an argument to enable that (e.g. droplevels=false), but note that in any case that would only work if df.flag is a CategoricalArray, since otherwise there’s no way by can know what are the possible values.

mike_k · April 16, 2019, 11:57am

Okay. Thank you anyway for your fast response

tbeason · April 16, 2019, 2:15pm

Create a new column that takes a value 1 when either flag == 1 or flag == 2 and 0 otherwise. Then also use this as a grouping column in the by command.

by(df, [:param1, :param2, :newcol], :res=>mean)

It will also compute the mean for values with flag > 2, you can just reset them to NaN if you want.

mike_k · April 16, 2019, 2:23pm

This is a nice solution. Thank you!!

Topic		Replies	Views
How to groupby a column and take the mean while ignoring missing values General Usage	1	617	April 20, 2020
How to calcule the mean of values considering their tuples of another value: General Usage dataframes	6	253	November 14, 2022
Grouping a DataFrame by something other than an existing column Data data	2	857	August 6, 2017
Groupreduce of DataFrame with empty group General Usage	7	435	October 7, 2019
Combining elements from multiple rows by conditionals into columns with DataFramesMeta General Usage dataframes , dataframesmeta	10	346	July 5, 2023

DataFrames: how to calculate conditional mean values in grouped dataframe?

Related topics