From NA-ignoring aggregations, two other considerations regarding NA-like values.
First, does it make sense to make missing
special? There’s already a PR open that adds skipnothing
(or skipoftype
). There may still be cases where it’s preferred to skip NaN
instead. Should this just be generalized to skip(somevalue, iterator)
?
Second, consider the case of a matrix with missing values where you’d like to get the means of the rows. Without missing
values, you’d use mean(A, dims=1)
. But mean(skipmissing(A), dims=1)
doesn’t do what you want. Now you can’t really have SkipMissing
be a shapeful iterator; really SizeUnknown()
is the only sensible return type for Base.IteratorSize(::SkipMissing)
even if the underlying iterator has a shape. So I see two options to fix this:
- Functions like
mean
andsum
could specifically be overloaded forBase.SkipMissing{<:AbstractArray}
so thatmean(skipmissing(A), dims=1)
just works. - Instead of making
mean(skipmissing(A), dims=1)
work, you could replace it with[mean(skipmissing(x)) for x in eachslice(A, dims=1)]
(with Julia 1.1).
I think the latter is preferable. I’m not sure how many functions like mean
and sum
have a dims
keyword, but it seems like a pain to make them all be SkipMissing
-aware. Taking this further, I’m of the opinion that the dims
arguments for these functions represent a poor separation of concerns (and I know @andyferris shares this opinion, https://github.com/JuliaArrays/StaticArrays.jl/issues/498#issuecomment-450794143). So should they just be removed in favor of the eachslice
version that can be easily made to work with skipmissing
?