Why are missing values not ignored by default?

alfaromartino · November 26, 2023, 10:08pm

Don’t get me wrong, I understand the usefulness of not skipping missing values. My point is that should be the option, not the default behavior. Overall, this is quite opinionated, and this post was actually part of the thread what I don’t like about Julia.

In some cases, you want to ensure not adding mistakes. You could even do it at the final stage of doing data analysis. However, even if missing is pointing out a mistake, right now I skip missings without thinking twice.

In general, I think that given the distinction in Julia between NaN, nothing and missing, I think missing should have been treated as a special data type for data analysis. Or add a data type (let’s call it undisclosed) that has this behavior. So you could replace missing for undisclosed in your dataset, and then you can choose what you prefer.

Also, note that it’s not just skipmissing. It also affects other aspects, like this example I was giving to convert strings into numbers.

x = ["0", "1", missing]
tryparse.(Int, x)  # this errors
passmissing(tryparse).(Int, x)

or when you want to filter a dataframe

x = ["0", "1", missing]
x .== 0  #this errors
isequal.(x, 0)

I even had to create my own functions for comparisons with >=, etc

Topic		Replies	Views
What workflows for missing values are more ergonomic in Julia? Internals & Design	2	374	November 30, 2023
How does StatsBase.skewness work? Data	29	2626	January 29, 2019
A modest `missing`s 2.0 proposal Data	20	1197	October 31, 2020
Missing or NaN General Usage	26	12339	August 1, 2018
DataFramesMeta.jl and the state of the DataFrames ecosystem Data	36	4027	April 24, 2020

Why are missing values not ignored by default?

Related topics