How to calculate a weighted mean with missing observations

tkoolen · January 5, 2019, 4:16pm

From NA-ignoring aggregations, two other considerations regarding NA-like values.

First, does it make sense to make missing special? There’s already a PR open that adds skipnothing (or skipoftype). There may still be cases where it’s preferred to skip NaN instead. Should this just be generalized to skip(somevalue, iterator)?

Second, consider the case of a matrix with missing values where you’d like to get the means of the rows. Without missing values, you’d use mean(A, dims=1). But mean(skipmissing(A), dims=1) doesn’t do what you want. Now you can’t really have SkipMissing be a shapeful iterator; really SizeUnknown() is the only sensible return type for Base.IteratorSize(::SkipMissing) even if the underlying iterator has a shape. So I see two options to fix this:

Functions like mean and sum could specifically be overloaded for Base.SkipMissing{<:AbstractArray} so that mean(skipmissing(A), dims=1) just works.
Instead of making mean(skipmissing(A), dims=1) work, you could replace it with [mean(skipmissing(x)) for x in eachslice(A, dims=1)] (with Julia 1.1).

I think the latter is preferable. I’m not sure how many functions like mean and sum have a dims keyword, but it seems like a pain to make them all be SkipMissing-aware. Taking this further, I’m of the opinion that the dims arguments for these functions represent a poor separation of concerns (and I know @andyferris shares this opinion, https://github.com/JuliaArrays/StaticArrays.jl/issues/498#issuecomment-450794143). So should they just be removed in favor of the eachslice version that can be easily made to work with skipmissing?

Topic		Replies	Views
RE: Weighted Statistics with Missings Statistics dataframes	19	739	December 13, 2023
Statistics.mean() function with a Matrix containing missing values New to Julia	8	1167	February 6, 2023
DataFrames, aggregate with missings Data dataframes	2	560	May 4, 2020
Compute mean of array where all values could be missing New to Julia	5	393	April 21, 2021
Arithmetic operations on multi-dimensional arrays with Missings New to Julia	4	834	June 27, 2018

How to calculate a weighted mean with missing observations

Related topics