Union Types - Good or Bad?

Recently, I watched Jeff Bezancon’s talk titled “What’s Bad About Julia”. It’s an excellent talk. Towards the end, someone asks the question: “How much better or worse would the world be without union?”

I must admit, I was very surprised by this question: for me, union types are an unalloyed good.
I’ll invert this statement, and volunteer that languages that don’t have unions, and that lack statements (AKA modern languages!), and have strong typing - like Rust - are almost insufferable.

My primary gripe is with match statements consisting of, say, two “arms”. I now add a simple println!() at the end of one of the arms - the code no longer type checks. (This in a match statement that wasn’t assigned to anything - I didn’t care about the return type.) I was quite annoyed at the compiler for not typing the match expression as Union[Void, Foo], and forcing me to make the arms have matching type. It’s a fundamental weakness in Rust’s type system which is disproportionately felt by beginners in the language.

Finally I arrive at my question: I’m hoping the community, in a constructive spirit, can shed light on what this user may have been complaining about. Simply put, from a user perspective, how can unions be bad? (I realize they complicate the design of the language and implementation. I’m not talking about that - I’m talking about whether/how they make the language worse for users.)

2 Likes

This is, what I don’t like about it:

julia> [ 1.0, missing, 2.0 ]
3-element Array{Union{Missing, Float64},1}:
 1.0
  missing
 2.0

It’s not a big deal for me, but it is also something which I do not like to see just to have missing values. This is probably because, before Julia, I only (nearly) used R for data analysis and there I got used to a simple NA without much thinking about it. Now I am confronted with this cumbersome type Array{Union{Missing, Float64},1}.

So the complain would be: union types do encourage some kind of excessive long type expressions.

(I have took the point of view of a complete new Julia user who isn’t used to too much thinking about types, like e.g. a R user. This user is not a programmer, just a user like a data analyst. This past other me still has some mental overload when he sees Array{Union{Missing, Float64},1} :wink:)

Can you provide an MWE for this?

What would you prefer instead?

1 Like

You can still use NaN if you want:

julia> [1.0, NaN, 2.0]
3-element Array{Float64,1}:
   1.0
 NaN
   2.0

Since NaN has type Float64 and missing can’t, union types seem unavoidable to me in this case. I guess you could argue that NaN should be used in place of missing across the board, but that does privilege Float64 (in addition to various other problems which gave rise to the Missing type).

3 Likes

I’m sure you’re aware, but the mental model here is that for all T, R’s T is what Julia would call Union{Missing, T}. And then R also doesn’t have scalars, so R’s T means Julia’s Array{T, 1}. So R’s T is really always Julia’s Array{Union{Missing, T}, 1}.

R can get away with not supporting non-missing T because don’t seem to want the invariant of “not missing values” that much. R is really not getting away with conflating Array{T, 1} as the R core developers themselves admit: [1409.3144] Enhancing R with Advanced Compilation Tools and Methods – much of R’s need to fall back on C for everything stems from not supporting scalar types.

12 Likes

You haven’t seen nothing yet :grin::

julia> Base.uniontypes(StridedVector{Int})
4-element Array{Any,1}:
 DenseArray{Int64,1}
 Base.ReinterpretArray{Int64,1,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}
 Base.ReshapedArray{Int64,1,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{Base.ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}
 SubArray{Int64,1,A,I,L} where L where I<:Tuple{Vararg{Union{Int64, AbstractRange{Int64}, Base.AbstractCartesianIndex, Base.ReshapedArray{T,N,A,Tuple{}} where A<:AbstractUnitRange where N where T},N} where N} where A<:Union{Base.ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{Base.ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, DenseArray}

Although just because they are sometimes used in ways that could be considered antipatterns doesn’t mean they aren’t very useful in a lot of situations. They also allow a certain transparency into how type inference works, since they are essential for performance when certain code can only be partially inferred.

5 Likes

Nothing, I got used to it.

As you said, NaN is a float, so it’s not usable as general NA.

Sure I did :rofl:

To all: I thought I while about the OP and found not so much, which, for me, makes union types bad. So I somehow constructed the single little issue I had at the beginning just to say something. :smiley:

So back to topic, do you find something which may be an answer to

1 Like

That picked my curiosity, what is a language that lack statements? I know Haskell do not have variables, and depending your definition of statement it does not have statements. It is this what you are referring to?

In old school languages like C++ there is a distinction between expressions and statements. For example, if (...) {} is a statement - and has no type - whereas (..) ? blah : blah is an expression associated with a type.

The language I was referring to in my gripe was Rust. It’s expression-oriented, so everything returns a value, and my complaint is that the lack of union types combined with this expression-orientedness makes for a very fussy experience.

Another language that “lacks” statements is Julia. But this is good thing, you want everything to (optionally) return a value, but equally, I don’t want the compiler to be overly fussy when two branches returns different types A and B (say): in Julia this is simply resolved to Union{A, B}.

I’ll throw my 2c in: wouldn’t this be solved if you could define a type alias, picked up by the compiler and displayed in error/warning/info messages, that showed the “moral name”? So in the Union{Missing, Float64} you could do:

using OptionalFloat64 = Union{Missing, Float64}

… and after this you get nice compiler messages. Clang does something like this, and in addition (IIUC) has some cleverness to not display “obvious” template params.

I thought about another negative after posting: I guess UnionForAll types introduce time complexities in e.g. loading modules, manifested as the notorious TTFP problem. I don’t understand enough about the underlying type theory to say whether a move to “better” traits, and away from union types, would improve this situation, though. My hunch is that it wouldn’t.

Cf

https://github.com/JuliaLang/julia/issues/14946

1 Like

Thanks for that link; I skimmed over it and realized that I hadn’t factored in the issues arising from canonicalization of type names. That problem doesn’t arise in C++, because the underlying type system is relatively simple (no union types), and templates essentially project concrete types onto this simple underlying type system. And then there are concepts which is a different thing altogether… There’s no question in my mind that Julia’s type system is superior.

Creating programming languages is…hard.

2 Likes

In Haskell everything is a expression (I do not even think they have the concept of statement), however, it is very easy to deal with union types (and they are very common). Again, their union types many times are a little different from Julia union types. My understanding is that Julia union types are always abstract, while in Haskell there are many concrete union types and they are often are used.

Isn’t this something like

julia> struct OptionalFloat64
       value::Union{Missing, Float64}
       end

julia> a=OptionalFloat64.([1.0 missing 2.0])
1×3 Array{OptionalFloat64,2}:
 OptionalFloat64(1.0)  OptionalFloat64(missing)  OptionalFloat64(2.0)

I played with rust for a while, and while yes it doesn’t have union types, that only really comes into affect when you want an array to contain different (primitive) types, which for me is a situation I’ve never really needed. For return types you have that “enum” like thing where the different values in the enum can have data of a different type. Which effectively gives you the union ability.

1 Like

C really doesn’t do anything to “protect” a value, meaning you can treat a float like an Int WITHOUT doing any sort of conversion. Julia tries to protect the values, you said it was a float and by god you are going to treat it as a float!

Also C does have “unions”:

union Data {
   int i;
   float f;
   char str[20];
} data;  

You just create a “name” for your union grouping :slight_smile:.

2 Likes

That someone was me.
And this is great because I can explain what I was thinking when I asked that.
It wasn’t a complaint about unions at all, but a specific question in the context of Jeff’s talk mentioning contradictions and issues in the type system.

The particular case that is easy to remove is along the lines of

foo(x::Union{A, B}) = ...

One can replace that with

for T in (:A, :B)
   @eval foo(X::$T) = ...
end

Infact it is common to do this change over to remove ambiguities.

Not everything can be made ok. No more Vector{Union{Missing, Float64}}.
And it’s breaking so it wasn’t actually a suggestion.
But I was curious as to if removing unions would fix the things that break or are contradictory in the type system that Jeff mentioned during the talk
Since several seemed to involve unions.

9 Likes