I love Julia but am often puzzled about why Base does not make use of some of the things that make Julia great.
In base, the definition of missing is:
struct Missing end
const missing = Missing()
methods operating on missings, in turn, are defined on the ::Missing type signature.
I am absolutely puzzled why Base doesn’t base implement Holy Traits here? Why not define:
abstract type AbstractMissing end
struct Missing <: AbstractMissing end
const missing = Missing()
and then define methods operating ::AbstractMissing type signature, allowing developers to create alternative types of missing values? This takes one additional line of code (and would require refactoring type signatures of methods).
This might not seem like a big deal, but it is. “missing” is treated as equivalent to “I don’t know what the value actually is”. But there are different ways in which “I do not know” and these can really matter.
Suppose we have a panel of data:
id month x Lag(x) Lead(x)
1 1 100 missing missing
1 2 missing 100 120
1 3 120 missing 105
1 4 105 120 missing
The missing in column x probably means that the data was not entered or was not applicable in that period for person 1. The missing in the first-row column Lag(x) means that the person 1 had not yet entered the sample (had not joined the company, had not enrolled in the school, etc.). The missing in the last row of column Lead(x) means that the person exited the sample. These are not equivalent, and often must be modelled in different ways.
If x were a categorical variable, and you were estimating a transition matrix, ignoring differences in the type of missingness could cause serious problems, depending on the question.
Of course, there are many ways to keep track of this information, but the most direct way to keep track of this information is to have a type hierarchy:
In Base:
abstract type AbstractMissing end
struct Missing <: AbstractMissing end
const missing = Missing()
and functions like skipmissing would be defined on abstractmissing.
Now we can keep track of relevant missingness information without throwing away the functionality developed in Base for missing. Do so in a separate package with:
abstract type AbstractOutOfSample end
struct BeforeSampleMissing <: AbstractOutOfSample end
const bs_missing = BeforeSample()
struct AfterSampleMissing()
const as_missing = AfterSample()
I am writing some packages that handle (e.g. simulate, estimate) data indexed at different levels and perform operations on data that take into account how the data is indexed (e.g. taking lags of a panel).
This is much harder than it needs to be b/c of the absence of AbstractMissing.
Related question about missings:
typeof(["a", missing]) == Vector{Union{String, Missing})
struct Bob end
const bob = Bob()
typeof(["a", bob]) == Vector{Any}
Why does julia infer a union parametric type when Missing is one of the types but infers any Any parametric type when Bob is one of the types? This is surprising to me given that bob is defined analogously to missing in Base.