First, you are applying skipmissing
too late.
But even if you were to correct it, cor
wouldn’t work. This is a long-standing annoyance.
This is not a good idea, since the observations are not guaranteed to be matched.
We don’t have a good solution for this at the moment. Missings.jl (which is re-exported by DataFrames) provides skipmissings
.
julia> using Missings, Statistics
julia> x = [rand() < .2 ? missing : rand() for i in 1:10];
julia> y = [rand() < .2 ? missing : rand() for i in 1:10];
julia> sx, sy = collect.(skipmissings(x, y));
julia> cor(sx, sy)
-0.32257867573052007
But skipmissings
is not guaranteed to exist in the future. It’s deliberately documented as such even though Missings.jl is past 1.0.