Don’t get me wrong, I understand the usefulness of not skipping missing values. My point is that should be the option, not the default behavior. Overall, this is quite opinionated, and this post was actually part of the thread what I don’t like about Julia.
In some cases, you want to ensure not adding mistakes. You could even do it at the final stage of doing data analysis. However, even if missing is pointing out a mistake, right now I skip missings without thinking twice.
In general, I think that given the distinction in Julia between NaN
, nothing
and missing
, I think missing
should have been treated as a special data type for data analysis. Or add a data type (let’s call it undisclosed
) that has this behavior. So you could replace missing
for undisclosed
in your dataset, and then you can choose what you prefer.
Also, note that it’s not just skipmissing
. It also affects other aspects, like this example I was giving to convert strings into numbers.
x = ["0", "1", missing]
tryparse.(Int, x) # this errors
passmissing(tryparse).(Int, x)
or when you want to filter a dataframe
x = ["0", "1", missing]
x .== 0 #this errors
isequal.(x, 0)
I even had to create my own functions for comparisons with >=
, etc