Why are missing values not ignored by default?

It’s the data analyst’s job to ensure data integrity. The question “which observations contribute to this statistic” is something Julia, the language, can’t answer. The analyst should absolutely conduct additional robustness checks about how missing values are handled and what’s the appropriate way to deal with them.

The question is whether imposing skipmissing(...) or propagation on Boolean operations is the the right way to go about that. It’s costly for users to write skipmissing every time they wish to calculate the mean. I’m simply making an argument that the cost isn’t always worth the benefits.

1 Like