So I won’t get into my personal ideas about:
Because i have a lot of those very strongly held opinions, but this is more or less for another day and another context.
So, what does it take to do this? I’m thinking out loud:
struct Ignorable end
const ignore = Ignorable()
isignorable(x) = typeof(x) == Ignorable
function miss2ignore!(df)
for i in 1:ncol(df)
df[!,i] = replace(df[!,i], missing => ignore)
end
end
Ok, so now it’s pretty easy to just replace all the missing
with ignore
. I guess the next part is rewriting summary stats functions? They’d look like:
function mean(x::Vector{Union{Ignorable,T}}) where T
xig = filter(!(isignorable),x)
mean(xig)
end
...
This pattern is basically the same for every unitary summary function, so it’d be good to have a macro @declareignore1
that writes this code for each element of an array, then you could say
@declareignore1 [:mean,:median,:sum,:prod...]
For two argument summary functions like cor, a different macro would be needed presumably to filter pairs where either one or both of the entries are ignorable. Also seems pretty straightforward, you might get away with preallocating a pair of vectors, then iterating through and skipping over the ignorable entries… then resize! the vectors, then run the function
function cor(x::Vector{Union{Ignorable,T}},y::Vector{Union{Ignorable,T}}) where T
xx = Vector{T}(undef,length(x))
yy = Vector{T}(undef,length(y))
i = 1
for j in eachindex(x)
if isignorable(x[j]) || isignorable(y[j])
continue
else
xx[i] = x[j]
yy[i] = y[j]
i++
end
end
resize!(xx,i-1)
resize!(yy,i-1)
cor(xx,yy)
end
So basically write a macro that writes that … and then declare your cor and cov and whatever. (You could also do this somewhat functionally, create a function that does the filtering and then declare cor and cov as just calling the filtering function and then applying cor and cov etc.
I think after about 50-100 lines of code in either a small package or just a utility script you can include, you’ve got all the tools you’d normally need?
I guess now your big problem will come with regular arithmetic and comparisons and such?