Dear Community,
I have a DataFrame df, which I group for instance like:
foo = by(df, [:param1, :param2], :res => mean)
Here, columns param1 and param2 store parameter values and column res stores result values. Foo shows the mean result values for each parameter combination.
I also have another column :flag which stores exit flags (integers) of each result. How can the example above be adapted to consider only values in which certain exit flags are set? e.g., something like:
Thank you, this works!
Unlikely I forgot to mention the following: suppose there is a parameter combination in which there is no df.flag .<=2. These combinations are omitted in the resulting dataframe. Is there a way to force them to be listed (for instance with NaN values)?
Not currently. We could add an argument to enable that (e.g. droplevels=false), but note that in any case that would only work if df.flag is a CategoricalArray, since otherwise there’s no way by can know what are the possible values.
Create a new column that takes a value 1 when either flag == 1 or flag == 2 and 0 otherwise. Then also use this as a grouping column in the by command.
by(df, [:param1, :param2, :newcol], :res=>mean)
It will also compute the mean for values with flag > 2, you can just reset them to NaN if you want.