Why are missing values not ignored by default?

We can allow ? as a general purpose postfix operator, it doesn’t have to be only for missings. Then Missings.jl would just be one package making use of it.

It could look like this:

  1. Allow ? as a postfix operator like ' (but without any method definition in Base, initially at least)
  2. In Missings.jl define ? as a shorthand for skipmissing (this could also go in a new ShortMissings.jl package or whatever)
  3. Fix mean to work well with skipmissing (cf this comment)
  4. Add overloads to cor, etc. to properly support things like cor(skipmissing(a), skipmissing(b))
  5. Add support for skipmissing-boolean indexing in DataFrames.jl

At this point we can do things like this:

mean(x?)
cor(x?, v?)   # same behavior as polars or pandas

df[(df.x .> 0)?, :]

But let’s go further:

  1. In Missings.jl define ?(f::Function) to make wrappers that skip missing values:

    cor?(x,v) would mean MissingSkipper(cor)(x,v) which would eventually call cor(x?, v?)

It’s a nice shortcut but especially useful for cases like this:

combine(gdf, :value => mean?)

Here’s a working prototype with instead of ?:

struct MissingSkipper{T}
    f::T
end

var"'ˢ"(x::AbstractArray) = skipmissing(x)
var"'ˢ"(f::Base.Callable) = MissingSkipper(f)
(s::MissingSkipper)(args...; kwargs...) =
    (s.f)(skipmissing.(args)...; kwargs...)

x = [1, 2, missing, 3]

julia> mean(x'ˢ)
2.0

julia> mean'ˢ(x)
2.0

# would work if cor(x::SkipMissing, y::SkipMissing) was defined:
julia> cor'ˢ(x, x)
1 Like