Apply weights in JuliaDB groupby


#1

Hi,

I’ve been struggling to find a way to calculate the weighted descriptive statistics in JuliaDB using the groupby function. Assuming that “w” is a column of the table DB with sampling weights, I would like to find something like this:

groupby(@NT(avg=z->mean(z, weights(:w)), DB, (:x,:y), select=:z)

I understand why the above doesn’t work, but I cannot come up with a way to pass the weights w in the mean (or any other) function.

Thanks!


#2

You need to select both columns in the groupby, meaning:

groupby(@NT(avg=z->mean(column(z, :z), weights(column(z,:w))), DB, (:x,:y), select=(:z, :w))

To manipulate symbols as columns as you were doing you’d need some sort of metaprogramming, for example:

using JuliaDBMeta
@groupby DB (:x, :y) @NT(avg = mean(:z, weights(:w))))

#3

Thank you very much, that worked!

Could you include an example like the above in your JuliaDB tutorial for others to see? I found it extremely helpful and I believe this should be a common question for the ones that work with survey data.

I see that there is an example that you call a column but it wasn’t clear to me that it has general applications.

For example, in the above, I was trying:

groupby(@NT(avg=p->mean(p.z, weights(p.w)), DB, (:x,:y), select=(:z,:w))

because I was mimicking the filter function.


#4

Good point, I’ll go through the tutorial again shortly as we decided to add it to the official docs, I’ll definitely keep in mind that grouping when needing two columns is a bit tricky so I’ll add an example about that. The “inconvience” is that t.x works on a row to select a field, but not on the table to select a column, and you need column(t, :x). Though in Julia 0.7 we could probably use dot overloading and allow t.x to select a column (if there’s consensus to do that).