AbstractMissing

I love Julia but am often puzzled about why Base does not make use of some of the things that make Julia great.

In base, the definition of missing is:

struct Missing end 
const missing = Missing() 

methods operating on missings, in turn, are defined on the ::Missing type signature.

I am absolutely puzzled why Base doesn’t base implement Holy Traits here? Why not define:

abstract type AbstractMissing end 
struct Missing <: AbstractMissing end 
const missing = Missing() 

and then define methods operating ::AbstractMissing type signature, allowing developers to create alternative types of missing values? This takes one additional line of code (and would require refactoring type signatures of methods).

This might not seem like a big deal, but it is. “missing” is treated as equivalent to “I don’t know what the value actually is”. But there are different ways in which “I do not know” and these can really matter.

Suppose we have a panel of data:

id month x Lag(x) Lead(x)

1 1 100 missing missing
1 2 missing 100 120
1 3 120 missing 105
1 4 105 120 missing

The missing in column x probably means that the data was not entered or was not applicable in that period for person 1. The missing in the first-row column Lag(x) means that the person 1 had not yet entered the sample (had not joined the company, had not enrolled in the school, etc.). The missing in the last row of column Lead(x) means that the person exited the sample. These are not equivalent, and often must be modelled in different ways.

If x were a categorical variable, and you were estimating a transition matrix, ignoring differences in the type of missingness could cause serious problems, depending on the question.

Of course, there are many ways to keep track of this information, but the most direct way to keep track of this information is to have a type hierarchy:

In Base:

abstract type AbstractMissing end 
struct Missing <: AbstractMissing end 
const missing = Missing() 

and functions like skipmissing would be defined on abstractmissing.

Now we can keep track of relevant missingness information without throwing away the functionality developed in Base for missing. Do so in a separate package with:

abstract type AbstractOutOfSample end 
struct BeforeSampleMissing <: AbstractOutOfSample end 
const bs_missing = BeforeSample() 
struct AfterSampleMissing() 
const as_missing = AfterSample()

I am writing some packages that handle (e.g. simulate, estimate) data indexed at different levels and perform operations on data that take into account how the data is indexed (e.g. taking lags of a panel).
This is much harder than it needs to be b/c of the absence of AbstractMissing.

Related question about missings:

typeof(["a", missing]) == Vector{Union{String, Missing})
struct Bob end 
const bob = Bob()
typeof(["a", bob]) == Vector{Any} 

Why does julia infer a union parametric type when Missing is one of the types but infers any Any parametric type when Bob is one of the types? This is surprising to me given that bob is defined analogously to missing in Base.

That’s not Holy Traits. That’s just normal type-hierachy
(but also in general Base uses minimal Holy Traits, using them as a last resort, when a simple abstract type isn’t enough.)


Anyway, the issue of AbstractMissing and narrow unions are distinct.
Narrow unions are desirable for other times too.
Narrow unions are something that ideally would be declared with a trait>

UnioningStyle(::Any) == WideUnions()
UnioningStyle(::Missing) = NarrowUnions()
UnioningStyle(::Nothing) = NarrowUnions()

UnioningStyle(::ChainRulesCore.AbstractZero()) = NarrowUnions()
UnioningStyle(::ChainRulesCore.NotImplemented()) = NarrowUnions()

The issue about this is:

3 Likes

You might be interested in the experimental package TypedMissings.jl which does something similar.

But I think to answer your question, there is no agreed upon set of missing types, and using dispatch for all this might be tough on the compiler. So it’s hard to implement.

1 Like

Thank you.

How hard are type hierarchies for the compiler in general?

For example, if we redefined

struct Missing <: AbstractMissing end 

and redefined all methods currently defined on ::Missing to be defined on ::AbstractMissing,
but added no additional types to the AbstractMissing type hierarchy, by how much would that slow down compilation? In general, I thought it was a good design pattern in Julia to use abstract types if it seems at all plausible that others might want to extend. Are the compilation costs something that package developers in general should take into account? Alternatively, if using abstract types causes compilation difficulties only after additional types are added to the type hierarchy, then it would seem that packages could independently take into account the compilation costs of extending the AbstractMissing type, and these compilation costs should not be considered when definined types in Base.

cc @nalimilan

Afaiu those declarations don’t make any difference, as many functions are not typed at all (accept Any). What does matter is the type of the object given as input at execution time.

I don’t see anything wrong about what you propose, and I’m curious to see what was the design choice behind that.

1 Like

The problem isn’t the cost of abstract types for the compiler, it’s due to the fact that the behavior of missing is currently special-cased in a few places. The most problematic one is Allow custom types to get narrow Union type with map (like `promote_typejoin` for `Missing`/`Nothing`) · Issue #38241 · JuliaLang/julia · GitHub as @oxinabox mentioned. There are many other places where x === missing is used instead of ismissing(x); I had made a PR to change these (https://github.com/JuliaLang/julia/pull/44407) but got blocked by a performance regression which would need the attention of compiler devs to fix.

Anyway, you can have a look at the end of TypedMissings.jl/TypedMissings.jl at main · nalimilan/TypedMissings.jl · GitHub to see what methods currently have to be overwritten to get a custom missing type to work.

2 Likes

If there were AbstractMissing, I’m not sure whether it should have the propagating behavior that Missing does in so many functions.