I have a DataFrame df, which I group for instance like:
foo = by(df, [:param1, :param2], :res => mean)
Here, columns param1 and param2 store parameter values and column res stores result values. Foo shows the mean result values for each parameter combination.
I also have another column :flag which stores exit flags (integers) of each result. How can the example above be adapted to consider only values in which certain exit flags are set? e.g., something like:
foo = by(df, [:param1, :param2], :res => mean if(:flag .= 1 || :flag.=2) )
Thank you in advance.
On the phone so can’t check but I think
by(df[df.flag.==1, :], [:param1, :param2], :res=>mean)
Thank you, this works!
Unlikely I forgot to mention the following: suppose there is a parameter combination in which there is no df.flag .<=2. These combinations are omitted in the resulting dataframe. Is there a way to force them to be listed (for instance with NaN values)?
Not currently. We could add an argument to enable that (e.g.
droplevels=false), but note that in any case that would only work if
df.flag is a
CategoricalArray, since otherwise there’s no way
by can know what are the possible values.
Okay. Thank you anyway for your fast response
Create a new column that takes a value 1 when either
flag == 1 or
flag == 2 and 0 otherwise. Then also use this as a grouping column in the
by(df, [:param1, :param2, :newcol], :res=>mean)
It will also compute the mean for values with
flag > 2, you can just reset them to
NaN if you want.
This is a nice solution. Thank you!!