I am computing weighted means of subgroups using the groupby
and transform
approach. See below for an illustration.
My understanding is that the new name is sourcevar1_sourcevar2_function
because the weighted mean function does not return a single value or vector (explained here).
I need to get weighted means of several columns and so I am wondering if there is any way to set the column name within the transform command? Or does this have to be done in a separate step?
Thanks for helping with this!
using DataFrames
df = DataFrame(Region = ["state1", "state1", "state1", "state2", "state2", "state2"], Income = [10, 7, 12, 10, 7, 12], Weight = [51, 20, 86, 75, 125, 16])
gdf = groupby(df, :Region)
df_reg_mean_unweighted = transform(gdf, :Income => mean => :Region_mean_income_unweighted) # Income unweighted
β Row β Region β Income β Weight β Region_mean_income_unweighted β
β β String β Int64 β Int64 β Float64 β
βββββββΌβββββββββΌβββββββββΌβββββββββΌββββββββββββββββββββββββββββββββ€
β 1 β state1 β 10 β 51 β 9.66667 β
β 2 β state1 β 7 β 20 β 9.66667 β
β 3 β state1 β 12 β 86 β 9.66667 β
β 4 β state2 β 10 β 75 β 9.66667 β
β 5 β state2 β 7 β 125 β 9.66667 β
β 6 β state2 β 12 β 16 β 9.66667 β
df_reg_mean_weighted = transform(gdf, [:Income, :Weight] => (x, y) -> (mean(x, weights(y)))) # Income weighted
β Row β Region β Income β Weight β Income_Weight_function β
β β String β Int64 β Int64 β Float64 β
βββββββΌβββββββββΌβββββββββΌβββββββββΌβββββββββββββββββββββββββ€
β 1 β state1 β 10 β 51 β 10.7134 β
β 2 β state1 β 7 β 20 β 10.7134 β
β 3 β state1 β 12 β 86 β 10.7134 β
β 4 β state2 β 10 β 75 β 8.41204 β
β 5 β state2 β 7 β 125 β 8.41204 β
β 6 β state2 β 12 β 16 β 8.41204 β