Status of question mark syntax for missing values

I remember seeing a discussion somewhere about using the syntax T? to mean Union{T,Missing}. What was the outcome of this discussion? I personally find the syntax very clever, and it would help clean up a lot of code where we need to pre-allocate memory with missing values.

buff = Array{Union{T,Missing},2}(undef, 100) where T # too verbose
buff = Array{T?,2}(undef, 100) where T # much cleaner

I remember also that some people started discussing the semantics of T?, and suggested that it could also mean Union{T,Nothing}. I personally find this type not very useful. The only places it shows up are in method signatures with optional arguments, and maybe in optional fields of a struct. The vast majority of applications in the scientific domain seem to handle missing values more often than the “optional” semantics.

Could you please share your thoughts and plans about this shortcut syntax? Any chance it will make to Julia v1.3?

I am starting to treat missing values seriously in my packages, and my code can become much uglier with the Union{T,Missing} everywhere.

For a while I was pretty ambivalent about whether we should use T? for Union{T, Nothing} or for Union{T, Missing}. The code I read and write has much more Union{T, Nothing} in it than Union{T, Missing}. However, I’ve come back around to wanting it to mean Union{T,Nothing} for one very key reason: we need a way to express lifting of functions to the Union{T, Missing} domain. What I mean by that is that instead of every couple of days getting a new feature request for missing support for the atanh function or whatever, I’d like to just make atanh? lift the atanh function so that it returns missing when any of its arguments are missing. We don’t need this feature for nothing since the whole point of nothing is that it doesn’t lift, whereas the point of missing is to lift. We might also want a @? macro similar to the @. macro, which annotates all function calls with ? lifting.

How do f? and T? mesh? Quite well, actually, if you view the T constructor as a function that returns values of type T—then T? is a function that returns values of type Union{T, Missing} which is exactly what function lifting is supposed to do. Of course, you still want this to be represented as the union type, Union{T, Missing}, so you just need to make sure that the type, when called, does the lifted version of T. @jameson had some thoughts on this and might be able to say a bit more.

10 Likes

since the whole point of nothing is that it doesn’t lift

The point is to ensure it propagates explicitly, tending to poison results uniformly and potentially lift any function over it with explicit syntax

I’m almost ready to reveal my heretical thought that we should define const nothing = missing, and then add all this syntax for it. But perhaps give it another year before I say a bit more.

1 Like

An interesting point. If I understand this correctly the question is when would we want an Union{Nothing, Missing} or Union{Nothing, Missing, T}. The only common pattern when it occurs is:

julia> using Missings

julia> x = ["a", "b", missing]
3-element Array{Union{Missing, String},1}:
 "a"
 "b"
 missing

julia> passmissing(match).(r"a", x)
3-element Array{Any,1}:
 RegexMatch("a")
 nothing
 missing

AFAICT

Another use of ? syntax (if we decided to go for missing with it) is that then we could print ? (possibly using a different color in REPL) instead of missing when we show tabular data (similarly to the change in the way we now show Bool in some cases).

How about defining ? to be an ordinary function (when it’s not used as if)? It would let you use different ? locally (like you can do it with, e.g., *):

let ? = passmissing
    match?.(r"a", x)
end

It may also be nice to be able define arbitrary lifting functions (say) using unicode combining characters:

const ?̃ = mylift

f?̃(x)
1 Like