# DataFramesMeta computing weighted median within @by

I’m struggling to figure this out. I have some data that look like this:

``````using DataFrames
using DataFramesMeta
using StatsBase

df = DataFrame(a=[1,1,2,2,3,3,4,4,5,5,6,6], b=rand(vcat(1:3, missing), 12), c=rand(10:50, 12))
`````` and I need to compute a weighted median of `:b` for each `:a` in the presence of `missing` values. I would compute the non-weighted version like so:

``````@chain df begin
@by(:a,
median_b = StatsBase.median(skipmissing(:b))
)
end
``````

I’ve tried to several ways to compute the weighted median but without any success. Here are a couple of examples:

``````@chain df begin
@by(:a,
median_b = StatsBase.median(
skipmissing(:b),
pweights(Array{Int64,1}(getindex(:c, map(x -> !ismissing(x), :b))))
)
)
end

# produces a MethodError

@chain df begin
@by(:a,
median_b = StatsBase.median(
skipmissing(:b),
pweights(:c[:b .!== missing])
)
)
end

# also produces a MethodError
``````

This always happens. I banged my head against the wall for half an hour trying to figure this out, and then 5 minutes after posting here I figured it out:

``````@chain df begin
@by(:a,
median_b = StatsBase.median(
collect(skipmissing(:b)),
pweights(:c[:b .!== missing])
)
)
end
``````

Just didn’t think long enough about why I was getting a `MethodError` (or take the time to carefully read the error message ). The call to `skipmissing` results in `::Base.SkipMissing{SubArray{Union{Missing, Int64}` so you just have to `collect` it to get `Array{Int64,1}`

1 Like

This is something we want to make easier! See a stale PR here.

As an aside, note that with more recent versions of DataFramesMeta, you can use `begin ... end` blocks, with transformations on separate lines, to avoid ugly parentheses and commas (as well as use macro-flags like `@byrow` and `@passmissing` easier).

2 Likes

In general alternatively you can add `dropmissing(:b, view=true)` (view to avoid allocations alternatively just `dropmissing(:b)`) as a first step in the chain and things should simplify.

3 Likes