Without getting into the discussion of data scientists vs software engineers - what use exactly are you needing T? for Union{T, Nothing} for in actual use?
Union{T, Nothing} is very common for field types and function argument types. Wasting such a nice syntax on a type that most people do not use (Missing) would be quite unfortunate in my opinion.
I explicitly wanted to avoid the discussion of which was most used / common (it depends on what you use Julia for), whether data scientists or software engineers were first class users etc., but instead to discuss what you would use it for. Would you use it in struct definitions?
struct Mystruct
x::Int?
end
I think that’s not so nice, personally.
My argument being that it’s mostly used for controlling function dispatch. And that f(x::Union{T, Nothing}) = ... is a lot less common than f(x::Union{T, Missing}) = ... because in the first case you would often either dispatch on the T or Nothing or use duck typing.
In some small projects, I’ve defined |(::Type{A}, ::Type{B}) where {A, B} = Union{A, B} which always felt really nice. The main annoyance was that you couldn’t do
struct Foo
x::Int|Float64
end
because of the lowering precedence/greediness, instead, it had to be:
It is exlicit enough to handle every case, while still not appearing overly verbose, i.e. it is easy enough to write Int|Missing without requiring yet another special alias that is specific to Missing and different for Nothing.
It also doesn’t bother me personally to need parantheses due to precedence, in fact I prefer it because my brain also parses x::Int|Float64 as (x::Int)|Float64.
Without being too hyperbolic, I think julia will lose the data science crowd if there’s no convenient syntax for arrays that may contain missing. To be honest I still do some of my data analysis in R because dealing with missings, especially writing functions for them, is so annoying.
I should probably use missings more, but they are a bit of a hassle. In contrast, NaNs are really easy to handle (and most of my data is Float64 anyway), so I am stuck with that old-fashioned approach.
From my perspective (finance/stats), missings should be as simple as NaNs, that is, Union{T,Missing}, should be implicit in T.
NaNs are implemented in hardware; missing is not so lucky…
I’m all for a simplified syntax, btw. Implementing | for Float64|Missing seems like a great candidate. I also really like using ? for one of the other, but wouldn’t know which. And did no-one bring up the case Float64|Missing|Nothing yet?
Still, the handling of missings could be easier. My use case: I load some data that has no missings, do some data wrangling, and then want to set x[1001,37]=missing. I believe the current way of doing it is to first do convert(Array{Union{Float64,Missing}},x). Making this simpler (convenience functions? macros?) would help.
Additionally, while I personally like Int|Missing syntax my experience with casual data scientists is that they would prefer something shorter. Exactly for this reasons in DataFrames.jl we use ⍰ when printing column type when displaying a data frame as otherwise it was simply too long.
If T? is taken for Union{T, Nothing} (I get the arguments behind it) then maybe we could choose some other short signal of missingness, e.g. a double ?? (it is not ideal though) or |??
Code that needs to do this kind of stuff in e.g. Tables.buildcolumns ends up defining things along the lines of setindex_or_widen! or push_or_widen! that checks the eltype and if it’s compatible does the operation in place and return the first argument, otherwise expands the type and returns a different object. Maybe there could be some macro that makes this less clumsy? Say some
Personally, I feel that T? meaning Union{T, Missing} or Union{T, Nothing} would decrease the legibility and approachability of code. I like the T|Missing solution quite a bit but for people who complain that’s still too much writing, I think we should just allow ? to be used in variable / constant names such that one is free to define
const Int? = Int|Missing
or
const Int? = Int|Nothing
if that’s what they really need, but it seems like a bad thing to provide by default from base.