Aliases for Union{T, Nothing} and Union{T, Missing}?

That’s a good argument. What’s the opposing argument that led to the postponement of a decision? Simply caution?

One idea is then to suggest T? for Union{T, Missing}, and keep an eye out for something clever for Nothing, if it turns out to be useful.

2 Likes

Has there been any progress/further discussion on this? FWIW, I really like T? as well.

2 Likes

Me too. For Union{T, Nothing} :wink:

Yo used to be able to do

const ?{T} = Union{T, Nothing}

use nothing instead of Nullable by KristofferC · Pull Request #90 · JuliaWeb/GitHub.jl · GitHub. Beautiful.

5 Likes

Without getting into the discussion of data scientists vs software engineers - what use exactly are you needing T? for Union{T, Nothing} for in actual use?

Union{T, Nothing} is very common for field types and function argument types. Wasting such a nice syntax on a type that most people do not use (Missing) would be quite unfortunate in my opinion.

6 Likes

Given that the API of some functions in Julia now gives nothing if no sensible output exist, i.e.

julia> tryparse(Float64, "a")

julia> findfirst(==(3), [1, 2])

there was a proposal to create a few ? based syntactic sugar to work with nothing https://github.com/JuliaLang/julia/issues/26303.

If that happens, then the T? notation would be more consistent for nothing than missing.

1 Like

I explicitly wanted to avoid the discussion of which was most used / common (it depends on what you use Julia for), whether data scientists or software engineers were first class users etc., but instead to discuss what you would use it for. Would you use it in struct definitions?

struct Mystruct
  x::Int?
end

I think that’s not so nice, personally.
My argument being that it’s mostly used for controlling function dispatch. And that f(x::Union{T, Nothing}) = ... is a lot less common than f(x::Union{T, Missing}) = ... because in the first case you would often either dispatch on the T or Nothing or use duck typing.

1 Like

In some small projects, I’ve defined |(::Type{A}, ::Type{B}) where {A, B} = Union{A, B} which always felt really nice. The main annoyance was that you couldn’t do

struct Foo
    x::Int|Float64
end

because of the lowering precedence/greediness, instead, it had to be:

struct Foo
    x::(Int|Float64)
end
8 Likes

I also don’t like the ? because it doesn’t look like a unary operator, especially placed on the end like that.

I like @quinnj’s Int|Missing much better, because

  • It is already a binary operator.
  • It is exlicit enough to handle every case, while still not appearing overly verbose, i.e. it is easy enough to write Int|Missing without requiring yet another special alias that is specific to Missing and different for Nothing.

It also doesn’t bother me personally to need parantheses due to precedence, in fact I prefer it because my brain also parses x::Int|Float64 as (x::Int)|Float64.

4 Likes

Without being too hyperbolic, I think julia will lose the data science crowd if there’s no convenient syntax for arrays that may contain missing. To be honest I still do some of my data analysis in R because dealing with missings, especially writing functions for them, is so annoying.

1 Like

Writing Vector{Int|Missing} is really that much more annoying that writing Vector{Int?}?

I should probably use missings more, but they are a bit of a hassle. In contrast, NaNs are really easy to handle (and most of my data is Float64 anyway), so I am stuck with that old-fashioned approach.

From my perspective (finance/stats), missings should be as simple as NaNs, that is, Union{T,Missing}, should be implicit in T.

I get your point, but it’s hard to argue that in the face of a 5x slowdown:

julia> x = rand(1000);

julia> @btime $x .+ $x;
  437.634 ns (1 allocation: 7.94 KiB)

julia> y = Union{Float64, Missing}[x;];

julia> @btime y .+ y;
  2.164 μs (9 allocations: 8.14 KiB)

NaNs are implemented in hardware; missing is not so lucky…

I’m all for a simplified syntax, btw. Implementing | for Float64|Missing seems like a great candidate. I also really like using ? for one of the other, but wouldn’t know which. And did no-one bring up the case Float64|Missing|Nothing yet?

OK, I was probably too greedy.

Still, the handling of missings could be easier. My use case: I load some data that has no missings, do some data wrangling, and then want to set x[1001,37]=missing. I believe the current way of doing it is to first do convert(Array{Union{Float64,Missing}},x). Making this simpler (convenience functions? macros?) would help.

Very good example. Quite similar to

julia> x = [1:10;];

julia> x[3] = 1.5
ERROR: InexactError: Int64(1.5)

I agree with you that a friendlier way of dealing with this would be very nice.

1 Like

You have allowmissing function in Missings.jl.

Additionally, while I personally like Int|Missing syntax my experience with casual data scientists is that they would prefer something shorter. Exactly for this reasons in DataFrames.jl we use when printing column type when displaying a data frame as otherwise it was simply too long.

If T? is taken for Union{T, Nothing} (I get the arguments behind it) then maybe we could choose some other short signal of missingness, e.g. a double ?? (it is not ideal though) or |??

3 Likes

eltype forcing assignment?

x = [1, 2, 3];
typeof(x) == Array{Int64, 1}
true

x[2] := 2.0;
typeof(x) == Array{Float64, 1}
true

seems highly energetic and frothy but not in a good way

Code that needs to do this kind of stuff in e.g. Tables.buildcolumns ends up defining things along the lines of setindex_or_widen! or push_or_widen! that checks the eltype and if it’s compatible does the operation in place and return the first argument, otherwise expands the type and returns a different object. Maybe there could be some macro that makes this less clumsy? Say some

@widen push!(x, a)
@widen x[i] = a

Personally, I feel that T? meaning Union{T, Missing} or Union{T, Nothing} would decrease the legibility and approachability of code. I like the T|Missing solution quite a bit but for people who complain that’s still too much writing, I think we should just allow ? to be used in variable / constant names such that one is free to define

const Int? = Int|Missing

or

const Int? = Int|Nothing

if that’s what they really need, but it seems like a bad thing to provide by default from base.

I would like '\questiondown' ( ¿ ) and '\exclaimdown' ( ¡ ) to become available within Julia.