I want to use the Loess smoothing with missing values in y (and then fill those with the loess estimate). I hope I would be able to do it with pairwise from StatsBase, but did not manage.
The code below works but feels a bit cumbersome:
using Loess
function loess4Miss(x, y; span=0.3)
ok = findall(@. !ismissing(x) & !ismissing(y))
model = loess(disallowmissing(x[ok]), disallowmissing(y[ok]); span=span, degree=1)
predict(model, x)
end
x = rand(100);
y = sin.(x*π) + 0.03*randn(100)
y = allowmissing(y)
y[10] = missing
loess4Miss(x,y)
Note that if you have y with missing x and x with missing y you should really do the DataFrames version because you can be completely screwing up your pairings otherwise
But I also read somewhere, that skipmissings (with s) will be discontinued? So should one use the DataFrame way? Looks a bit like a detour, but maybe I just need to get used to it. Any “general” advice welcome, e.g. @bkamins. Btw thanks for all your helpful comments and blogs.
There is a discussion abut a better design, but probably skipmissings will stay to avoid breaking changes.
My major comment is that I do not think that you have to use DataFrames.jl, but I would recommend you to use any table-aware storage type, so that it ensures synchronization of rows between several vectors if these vectors are logically connected out of the box (as this is what you essentially need here). Such design is in my opinion cleaner conceptually.
(in general I think that is why “data frame” concept become so popular everywhere - it makes thinking about such cases easier)
Is the na.action mechanism really useful in R? I’ve never seen people really using it, as the default behavior of skipping missing values seems to be enough for everyone.