Proposed alias for union types

It’s been 11 years since the release of Julia 0.1 — my point is that this being the first time someone has even voiced a desire for a more concise Union syntax is not exactly evidence of a burning need.

1 Like

Yes, but the issue has only arisen in the last few years because of its recent rising popularity, particularly in TypeScript and Python. Precedent did not exist 11 years ago. It very much does now.

5 Likes

I can confess that the only reason I considered this idea at all (and thus why I started this thread) was because I started noticing this syntax more and more frequently in Python libraries.

It doesn’t seem unusual to me that it hasn’t been thought of until now… that’s just how languages evolve, by cross-breeding and mutating ideas in a larger ecosystem of other languages. And this syntax has only recently started becoming a dominant species.

7 Likes

I’ve actually seen this brought up on Slack and Discourse, but the syntax itself doesn’t add any new capabilities so someone would have to make a PR and bikeshed with the understanding that there’s a very real possibility that the proposal is not accepted.

5 Likes

There is a conflicting usage. In Python T1 | T2 is a union type. In functional languages, T1 | T2 is a sum type (aka tagged union), not a union type.

Julia users have been exploring sum types eg in SumTypes.jl. These tend to be especially useful in combination with pattern matching.

1 Like

A tagged union is pretty similar to a sum type. In Julia the main difference would be giving each union member a unique name identifier

1 Like

It’s consequential because, for one thing, when pattern matching on one member of a sum type, you can statically check that you’re not missing any branches. You can’t do that with a value that doesn’t know which other values are possible.

I don’t think there is a guarantee but @inline might help? In any case, | chains lower to a sequence of binary operations so you’d have to write out or append that expression for benchmarks. The earlier talk about functions calling (|)(...) calling Union{...} should be looked into but is ultimately an aside.

Yeah that wouldn’t be good. I wonder, how does the underlying variadic function in Core.apply_type work without having to contend with specialization/recompilation. I tried reading up on variadic functions in C but couldn’t find a simple straight answer, and the definitions in builtins.c seem to be macros that I just couldn’t do in my head.

Still possible if it’s not breaking, but yeah I don’t see it happening if it doesn’t add some capability or improve things a lot for people. Feature bloat isn’t desirable.

Existing code may handle the call assuming the signature is not implemented, so undoing that is still considered type piracy. That doesn’t mean you can’t ever do it; maybe you want that change to fix bugs, or that call signature may really be never used so the change is isolated. You would have to test thoroughly to look for things going out of control and be open to bug reports. Involving new types is the only way to easily prove the call signatures only affect new code.

I’d rather not need to learn cosmetic features just because some other language started doing it. I had once actually looked for a Python feature before, specifically the walrus operator letting you assign and use a variable within a conditional statement and other one-liners. I learned to ask “can this be done easily already” real fast when someone pointed out that not only could parenthesized assignment already do this, you can make multiple-line conditional statements in let and begin blocks with a boolean value in the last line.

1 Like

@jar1 I don’t think this would necessarily be incompatible with sum types because you would need a different syntax to describe the tagging anyways. In other words, Base.:|(::Vararg{Type}) is not enough information to create a sum type.

If Base.:| is defined in this way in Base, then SumTypes could look at overloading a Base.:|(::Vararg{SumTypes.Tagged}). Which IMO is a more palatable syntax if Int | Float64 starts being allowed for untagged/normal unions. So basically we are on the same team here :slight_smile:

Although since SumTypes requires creating new structs which are not actually Union, you would still need a macro to pull this off… In any case, SumTypes is an external package not in Base, which is also still in an experimental stage (I use it in one of my packages; it’s great!) so not sure how practical it is to discuss here.


@Benny you have mentioned type piracy a few times but I am not sure why. Just to be clear, the idea we are discussing is that Base.:|(::Vararg{Type}) could be added to Base itself. This is not type piracy. Only if I defined that method myself in one of my own packages would it be type piracy.

2 Likes

Correct, except a pirating method could be defined from any module, not just packages. The point was that you couldn’t assume implementing an unimplemented call signature would not affect existing code because you’re doing the same changes included in type piracy. The distinction from type piracy is that by contributing to Base directly, you have the responsibility to find and handle any bugs that arise from the change. The principle of new types being the only easy proof of no effects on existing code applies whether you’re pirating or contributing.

FWIW, before union types we should discuss unions of sets.

Currently | does not compute the union of sets, and & does not produce its intersection.

I believe that | for union types comes from unions of sets, under the interpretation that a type represents a set and isa represents elementOf.

I believe that | for unions of sets comes from the duality between predicates and sets, i.e. {x: p(x)} \cup {x: q(x)} = {x: p(x) | q(x)}; given that we were restricted to ascii / typewriter back when this convention was made, it makes sense. Also meshes nicely with bitwise-or and bags-of-flags.

If we want to continue that grand tradition of conflating logical operations on predicates with set-ops on their extents, then we should also do that for sets, not just for types.

I personally like the Int32 | UInt32notation a lot.

But the current Union{...} is not overly clunk and my capacity for syntactic annoyances is fully used up by the lack of ascii-infix-xor (a multi-letter one would suffice, like /xor/ or something; just needs to be parsed infix)

1 Like

Ideally only the most frequently used parts of the language should get dedicated infix syntax.

It is not apparent to me that Union is one of these. I usually use it less than once for every 100 LOC, and prefer to define a type alias if I end up repeating the same Union{...} declarations.

I count roughly 1700 uses of Union (with grep -r base) in Julia Base code, which may be more than typical. Around 220 of these are Union{}, which this syntax would not change, around 100 are comments.

I don’t think that introducing special syntax for something typical code uses in approximately every 100th line makes sense.

10 Likes

Thanks for the analysis, @Tamas_Papp! I really like your approach to quantify this sort of thing. While it can’t be a comprehensive way to study the problem (e.g., considering @mkitti’s points about users coming from other languages where | is used – this might change the scale of a particular LOC frequency), it is a really nice way to frame things in a more objective light.

Do you have on-hand what typical numbers are for other infix operators in Base? e.g., |(::Integer, ::Integer)?

I’m also very curious what the difference is with the greater package ecosystem. I can try to look at this later today if you don’t get a chance.

1 Like

I did not make a comprehensive study, but eg for + it is more than 3300 counts using grep.

But note that typically Julia uses infix for arithmetic, and there is very little arithmetic done per se in Base. I would expect that in Base, Union is overrepresented and arithmetic is underrepresented, compared to typical Julia packages. Eg in LinearAlgebra,

tamas@tamas:~/src/julia-1.10/stdlib/LinearAlgebra$ grep '+' -r . | wc -l
1755
tamas@tamas:~/src/julia-1.10/stdlib/LinearAlgebra$ grep 'Union' -r . | wc -l
331

FWIW, in my own code it seems to be around one usage of Union for every 500 LOC. But maybe I am atypical.

2 Likes

Aren’t all of :, ::, , ==, ===, &&, ||, ->, =>, and even . infix operators?

1 Like

Sure they are.

You may want to compile your own statistics based on the package ecosystem, but note that usage frequency is not the only motivation to make an operator infix. Eg short-circuiting && and || have to be infix because they are not function calls, etc.

Also, you may find that some super-rarely used operators are infix, eg ++. But that does not imply that Union should have an infix version, just that the language designers left some room for expansion in the syntax.

2 Likes

Would it be possible to test the usefulness of this syntactic sugar by using it in type reporting tools such as Cthulhu.jl or the VS Code extension?

1 Like

I’m not sure that this is the best metric, since a lot of conventional infix operators (e.g. %, &, \) are used in only a small fraction of Julia code … but when they are used, the code would be much less readable without them. Arithmetic operators are often deeply nested, which favors infix notation. Whereas Union{...} is almost never nested — it is typically used only at the outermost level of a type expression.

8 Likes

I’m not sure nestedness matters to liking or disliking the syntax (I still like it even at the top-level). I also think “almost never” is maybe a bit of an exaggeration…

We can also just quantify this. If I dump all the code in Base that matches Union, here’s the last 50 lines –

(expand)
    Base.get_bool_env(name::String, default::Bool)::Union{Bool,Nothing}
    code::Union{CodeInfo,Core.MethodInstance,Nothing}
    mod::Union{Module,Nothing}
    ret = Vector{Union{InterpreterIP,Ptr{Cvoid}}}()
            code = bt2[j]::Union{CodeInfo,Core.MethodInstance,Nothing}
            mod = njlvalues == 2 ? bt2[j+1]::Union{Module,Nothing} : nothing
A wrapper type used in `Union{Some{T}, Nothing}` to distinguish between the absence
promote_rule(T::Type{Nothing}, S::Type) = Union{S, Nothing}
    return Union{R, Nothing}
    R <: Union{} && error("cannot convert a value to nothing for assignment")
lcm(a::Union{Integer,Rational}) = gcd(a)
const HWReal = Union{Int8,Int16,Int32,Int64,UInt8,UInt16,UInt32,UInt64,Float32,Float64}
const HWNumber = Union{HWReal, Complex{<:HWReal}, Rational{<:HWReal}}
powermod(x::Integer, p::Integer, m::Union{Int128,UInt128}) = oftype(m, powermod(x, p, big(m)))
function isqrt(x::Union{Int64,UInt64,Int128,UInt128})
falses(dims::NTuple{N, Union{Integer, OneTo}}) where {N} = falses(map(to_dim, dims))
trues(dims::NTuple{N, Union{Integer, OneTo}}) where {N} = trues(map(to_dim, dims))
function unsafe_copyto!(dest::BitArray, doffs::Integer, src::Union{BitArray,Array}, soffs::Integer, n::Integer)
copyto!(dest::BitArray, doffs::Integer, src::Union{BitArray,Array}, soffs::Integer, n::Integer) =
function _copyto_int!(dest::BitArray, doffs::Int, src::Union{BitArray,Array}, soffs::Int, n::Int)
@propagate_inbounds function setindex!(B::BitArray, X::AbstractArray, J0::Union{Colon,AbstractUnitRange{Int}})
function splice!(B::BitVector, r::Union{AbstractUnitRange{Int}, Integer}, ins::AbstractArray = _default_bit_splice)
function splice!(B::BitVector, r::Union{AbstractUnitRange{Int}, Integer}, ins)
        ($f)(A::Union{BitMatrix,BitVector}, B::Union{BitMatrix,BitVector}) = ($f)(Array(A), Array(B))
(>>)(B::BitVector, i::Union{Int, UInt}) = B >>> i
function findnext(pred::Fix2{<:Union{typeof(isequal),typeof(==)},Bool},
function findprev(pred::Fix2{<:Union{typeof(isequal),typeof(==)},Bool},
map(::Union{typeof(~), typeof(!)}, A::BitArray) = bit_map!(~, similar(A), A)
map!(::Union{typeof(~), typeof(!)}, dest::BitArray, A::BitArray) = bit_map!(~, dest, A)
for (T, f) in ((:(Union{typeof(&), typeof(*), typeof(min)}), :(&)),
               (:(Union{typeof(|), typeof(max)}),            :(|)),
               (:(Union{typeof(xor), typeof(!=)}),           :xor),
               (:(Union{typeof(>=), typeof(^)}),             :((p, q) -> p | ~q)),
function hcat(A::Union{BitMatrix,BitVector}...)
_cat(dims::Integer, X::Union{BitArray, Bool}...) = _cat(Int(dims)::Int, X...)
function _cat(dims::Int, X::Union{BitArray, Bool}...)
function _split_rest(a::Union{Vector, BitVector}, n::Int)
function rationalize(::Type{T}, x::Union{AbstractFloat, Rational}, tol::Real) where T<:Integer
/(x::Rational, y::Union{Rational, Integer, Complex{<:Union{Integer,Rational}}}) = x//y
/(x::Union{Integer, Complex{<:Union{Integer,Rational}}}, y::Rational) = x//y
    env::Union{Vector{String},Nothing}
    cpus::Union{Nothing,Vector{UInt16}}
                 cpus::Union{Nothing,Vector{UInt16}} = cmd.cpus,
function show(io::IO, cmds::Union{OrCmds,ErrOrCmds})
setup_stdio(stdio::Union{DevNull,OS_HANDLE,RawFD}, ::Bool) = (stdio, false)
const Redirectable = Union{IO, FileRedirect, RawFD, OS_HANDLE}
ignorestatus(cmd::Union{OrCmds,AndCmds}) =
byteenv(env::Union{AbstractVector{Pair{T,V}}, Tuple{Vararg{Pair{T,V}}}}) where {T<:AbstractString,V} =
pipeline(src::Union{Redirectable,AbstractString}, cmd::AbstractCmd) = pipeline(cmd, stdin=src)
    parsed::Tuple{Vararg{Tuple{Vararg{Union{String, SubString{String}}}}}}

So it’s far from “almost never” nested…

Anyways once I find some time I can try to do a frequency analysis and also include nestedness for those interested.

1 Like

I would agree that’s a good use-case as it definitely makes the printouts more concise! Could even happen without this change in Julia itself (as an opt-in configuration option) for users who are comfortable with | from TypeScript/Python/etc.