Skipmissing no working in cor function

First, you are applying skipmissing too late.

But even if you were to correct it, cor wouldn’t work. This is a long-standing annoyance.

This is not a good idea, since the observations are not guaranteed to be matched.

We don’t have a good solution for this at the moment. Missings.jl (which is re-exported by DataFrames) provides skipmissings.

julia> using Missings, Statistics

julia> x = [rand() < .2 ? missing : rand() for i in 1:10];

julia> y = [rand() < .2 ? missing : rand() for i in 1:10];

julia> sx, sy = collect.(skipmissings(x, y));

julia> cor(sx, sy)
-0.32257867573052007

But skipmissings is not guaranteed to exist in the future. It’s deliberately documented as such even though Missings.jl is past 1.0.

4 Likes