How to calculate a weighted mean with missing observations

From NA-ignoring aggregations, two other considerations regarding NA-like values.

First, does it make sense to make missing special? There’s already a PR open that adds skipnothing (or skipoftype). There may still be cases where it’s preferred to skip NaN instead. Should this just be generalized to skip(somevalue, iterator)?

Second, consider the case of a matrix with missing values where you’d like to get the means of the rows. Without missing values, you’d use mean(A, dims=1). But mean(skipmissing(A), dims=1) doesn’t do what you want. Now you can’t really have SkipMissing be a shapeful iterator; really SizeUnknown() is the only sensible return type for Base.IteratorSize(::SkipMissing) even if the underlying iterator has a shape. So I see two options to fix this:

  1. Functions like mean and sum could specifically be overloaded for Base.SkipMissing{<:AbstractArray} so that mean(skipmissing(A), dims=1) just works.
  2. Instead of making mean(skipmissing(A), dims=1) work, you could replace it with [mean(skipmissing(x)) for x in eachslice(A, dims=1)] (with Julia 1.1).

I think the latter is preferable. I’m not sure how many functions like mean and sum have a dims keyword, but it seems like a pain to make them all be SkipMissing-aware. Taking this further, I’m of the opinion that the dims arguments for these functions represent a poor separation of concerns (and I know @andyferris shares this opinion, https://github.com/JuliaArrays/StaticArrays.jl/issues/498#issuecomment-450794143). So should they just be removed in favor of the eachslice version that can be easily made to work with skipmissing?

1 Like