Apply weights in JuliaDB groupby

pmarg · May 10, 2018, 3:48pm

Hi,

I’ve been struggling to find a way to calculate the weighted descriptive statistics in JuliaDB using the groupby function. Assuming that “w” is a column of the table DB with sampling weights, I would like to find something like this:

groupby(@NT(avg=z->mean(z, weights(:w)), DB, (:x,:y), select=:z)

I understand why the above doesn’t work, but I cannot come up with a way to pass the weights w in the mean (or any other) function.

Thanks!

piever · May 10, 2018, 4:24pm

You need to select both columns in the groupby, meaning:

groupby(@NT(avg=z->mean(column(z, :z), weights(column(z,:w))), DB, (:x,:y), select=(:z, :w))

To manipulate symbols as columns as you were doing you’d need some sort of metaprogramming, for example:

using JuliaDBMeta
@groupby DB (:x, :y) @NT(avg = mean(:z, weights(:w))))

pmarg · May 10, 2018, 5:07pm

Thank you very much, that worked!

Could you include an example like the above in your JuliaDB tutorial for others to see? I found it extremely helpful and I believe this should be a common question for the ones that work with survey data.

I see that there is an example that you call a column but it wasn’t clear to me that it has general applications.

For example, in the above, I was trying:

groupby(@NT(avg=p->mean(p.z, weights(p.w)), DB, (:x,:y), select=(:z,:w))

because I was mimicking the filter function.

piever · May 10, 2018, 5:33pm

Good point, I’ll go through the tutorial again shortly as we decided to add it to the official docs, I’ll definitely keep in mind that grouping when needing two columns is a bit tricky so I’ll add an example about that. The “inconvience” is that t.x works on a row to select a field, but not on the table to select a column, and you need column(t, :x). Though in Julia 0.7 we could probably use dot overloading and allow t.x to select a column (if there’s consensus to do that).

Topic		Replies	Views
JuliaDB groupby many columns General Usage juliadb	0	361	April 9, 2021
Frequency Weights in DataFrames Data question , dataframes	2	798	March 17, 2021
Name of dataframe column created by function New to Julia dataframes	5	610	December 2, 2020
Apply some functions to columns of a dataframe General Usage question , dataframes	5	1476	November 10, 2021
Applying weights to Randomized Arrays In a Dataframe Statistics	1	405	November 12, 2021

Apply weights in JuliaDB groupby

Related topics