Proposed alias for union types

It might be interesting to know if the developers of Cthulhu.jl and the VS Code extension would be interested in adopting one of the syntactic sugars and see if they can get feedback on it.

Could there be a poll for people to choose which option they prefer?

Collections on that page predate the formalization of interfaces (like abstract arrays, iteration, broadcasting). I agree that it is confusing, but there are no strong incentives to clean it up since people get by with informal definitions.

In retrospect, packages are the best place to evolve interfacesTables.jl is a great example of how well that works, but also how long it takes to refine interfaces (6 years and still ongoing). Maybe one day someone will formalize collections.

Incidentally, I think that the | syntax for type unions should be explored in a package too if that is feasible. Discussions about features are helpful, but some corner cases only appear with usage. It is of course technically type piracy, but as long as that is clarified I see no problem.

5 Likes

It is of course possible to use in a package. It is very feasible in a single-user project.

However, nobody sane should use that for “real” projects: The upside in terms of improved syntax is small, and this imposes a real “weirdness tax” on new developers / users of your project.

Saying “do this in packages” is mostly equivalent to “don’t do this”, due to network effects.

7 Likes

I created a package at GitHub - mkitti/OrUnions.jl: Julia package to evaluate alternate Union syntax via | and ∨ . To avoid piracy I used (\vee) rather than |.

julia> 0x5::(UInt8 ∨ Int8)
0x05

julia> f(x::(Int8 ∨ UInt8 ∨ Int)) = x
f (generic function with 1 method)

If you really want to use |, I created a macro called @orunion.

julia> @orunion function foo(::Int8 | UInt8 | Int16) end
foo (generic function with 1 method)

julia> methods(foo)
# 1 method for generic function "foo" from Main:
 [1] foo(::Union{Int16, Int8, UInt8})
     @ REPL[34]:1

julia> @macroexpand @orunion function foo(::Int8 | UInt8 | Int16) end
:(function foo(::Union{Int16, UInt8, Int8})
      #= REPL[36]:1 =#
      #= REPL[36]:1 =#
  end)

julia> @orunion bar(x::Int8 | UInt8, y::Int16 | UInt16, z::(Int8 | UInt8) | Int) = nothing
bar (generic function with 1 method)

julia> methods(bar)
# 1 method for generic function "bar" from Main:
 [1] bar(x::Union{Int8, UInt8}, y::Union{Int16, UInt16}, z::Union{Int64, Int8, UInt8})
     @ REPL[38]:1

julia> @macroexpand @orunion bar(x::Int8 | UInt8, y::Int16 | UInt16, z::(Int8 | UInt8) | Int) = nothing
:(bar(x::Union{UInt8, Int8}, y::Union{UInt16, Int16}, z::Union{Int, Union{UInt8, Int8}}) = begin
          #= REPL[40]:1 =#
          nothing
      end)

julia> methods(@orunion (x::Int8 | UInt8)->5)
# 1 method for anonymous function "#35":
 [1] (::var"#35#36")(x::Union{Int8, UInt8})
11 Likes

This is great @mkitti, nice work.

I also wanted to append a couple of favorite examples after trying out an infixed type union on some of my code –

1. A common workflow in data science is to have tables with missing values, which means you might have some table schema that look like this:

julia> typeof(schema((a=[1f0, missing], b=[missing, "hello"], c=[1, missing])))

ScientificTypes.Schema{
    (:a, :b, :c),
    Tuple{Union{Continuous,Missing},Union{Textual,Missing},Union{Count,Missing}},
    Tuple{Union{Float32,Missing},Union{String,Missing},Union{Int64,Missing}},
}

With an infix operator, this gets simplified to:

ScientificTypes.Schema{
    (:a, :b, :c),
    Tuple{(Continuous|Missing),(Textual|Missing),(Count|Missing)},
    Tuple{(Float32|Missing),(String|Missing),(Int64|Missing)},
}

I find it’s easier to parse like this. Not only is it much less verbose, but I feel like it is structurally clearer. Maybe this is because you can clearly identify the different containers from their style – that being (A|B) for unions and MyType{A,B} for parametrized types. Whereas before this, squinting at the page, Union{A,B} looks like it could be another parametrized type that contains both an A and a B for some of its fields. But Union is special in that its type parameters are not all used in a single instantiation – so the visual cue really helps.

2. Setting defaults is easier to read:

Before:

f(; cleanup::Union{Bool,Nothing}=nothing) = ...

which becomes

f(; cleanup::(Bool|Nothing)=nothing) = ...

I like that Bool appears right next to cleanup. Before this, the first thing I see is that cleanup is a Union, and after parsing one more word I see that it could be a Bool. But after this, I first read that cleanup is a Bool, and the next symbol is “or this other thing”.

But stylistically the choice could be context-dependent. Maybe you would want to very explicitly inform a user that it is a Union – so you could just use that instead.

8 Likes

I think it’s really helpful to have real-world examples like those! In my experience — and looks like yours, too — Unions are most common as either Union{T, Missing} or Union{T, Nothing}. Personally, I’d aim for a solution that is more laser-focused at those use-cases. For example, it looks like a post-fix ? operator would be marginally available and might even be customizable. There could exist some future Julia world in which this would work:

julia> using Missings: ?

julia> Int?
Union{Missing, Int64}

julia> typeof(schema((a=[1f0, missing], b=[missing, "hello"], c=[1, missing])))
ScientificTypes.Schema{
    (:a, :b, :c),
    Tuple{Continuous?, Textual?, Count?},
    Tuple{Float32?, String?, Int64?},
}

#=--- or even ---=#
julia> var"?"(::Type{T}) where {T} = Union{T,Nothing}
? (generic function with 1 method)

julia> String?
Union{Nothing, String}

julia> Meta.@lower T?
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = var"?"(T)
└──      return %1
))))

This is would match how DataFrames currently uses the ? symbol in its output:

julia> df = DataFrame(a=[1,missing], b=["a",missing], c=[1.0,missing])
2×3 DataFrame
 Row │ a        b        c
     │ Int64?   String?  Float64?
─────┼─────────────────────────────
   1 │       1  a              1.0
   2 │ missing  missing  missing

See: Allow '?' in variable and function names · Issue #22025 · JuliaLang/julia · GitHub and Use ? to lift missing values · Issue #36628 · JuliaLang/julia · GitHub

9 Likes

? == Union{T,Missing} would be great also; it’s extremely intuitive! But I’m afraid I may not have the energy for another one of these threads; maybe someone else could carry the torch for that :sweat_smile:

I’d be more in favor of just simplifying all Unions at once. Especially given that | has such a strong precedent in other languages, and is not quite as frequently used as the ternary operator.


By the way, I really want to emphasize the point I made in the comment above, since I didn’t think about it until now. But it helped convince myself that we should try | even more:

Union is a unique type in that not all of its type parameters are used for a single instantiation

When you look at a container type like Container{A,B,C}, you might infer that each of A, B, and C are all used in the fields and necessary (not always, but often). But Union is a special type in that only one of its type parameters is used at a time! Or within a single method definition.

Thus, I think some sort of structural visual cue really helps. For example:

function f(x::Tuple{Union{Float64,Int64},MyType{<:Union{Float64,Nothing}}})
    a, b = x
    ...
end

If you are skimming code at 2am, it’s takes an extra second to parse that a could be a Float64 OR a Int64 – rather than both those types appearing somewhere. But compare this with a different visual cue for unions:

function f(x::Tuple{(Float64|Int64),MyType{<:(Float64|Nothing)}})
    a, b = x
    ...
end

to me, this highlights the variable parts in the signature, so in my head I can quickly imagine the behavior for different type combinations.

6 Likes

Personal opinion, but I somewhat strongly disagree here. Even if those cases are common, it feels really weird to privilege them in the parser, when there’s lots of other domains and use cases where people want to talk about other types of unions, and the conflict between Missing and Nothing isn’t likely to be resolved anyways.

This one seems really unlikely to me. The ternary operator a ? b : c is already such a big source of confusion and parser quirks, and this would make it even worse. E.g.

a?
b : c

would mean something completely different from

a ?
b : c

if that operator was added.

4 Likes

The ternary ? : syntax requires spaces everywhere to work. a? is a syntax error no matter how you break it up:

julia> a?
ERROR: syntax: space required before "?" operator
Stacktrace:
 [1] top-level scope
   @ none:1

This could be done without special casing either Nothing or Missing. All it’d do is parse T? akin to T' — and just like LinearAlgebra defines the post-fix ' to mean adjoint (transpose) while Diffractor uses it for adjoint (derivative), this could be available for packages and users to define as they wish. Note that this can extend beyond types to functions, to allow “lifting” functions similarly.

1 Like

currently yes, but your proposal is that the a? case would no longer throw an error. So

a ?
b : c

becomes

if a
    b
else
    c
end

whereas

a?
b : c

would parse to

Union{a, Missing}
(:)(b, c)
4 Likes

If you follow the link into the discussion in other langauges, you also see a discussion of using ? for an optional syntax.

With that in mind there is also

julia> const Optional{T} = Union{T,Nothing}
Union{Nothing, T} where T

julia> Optional{Float64}
Union{Nothing, Float64}

julia> const Maybe{T} = Union{T,Missing}
Union{Missing, T} where T

julia> Maybe{Int}
Union{Missing, Int64}
5 Likes

This thread is long enough to see where minds are settling.

If it is necessary to implement a change and add |, ? or both, what would be the best action?

  • | (or equiv notation) should be implemented first.
  • ? (or equiv notation) should be implemented first.
  • They both should be implemented together.
  • The premise is bad. Let’s leave things be for the time being.
0 voters
2 Likes

Regarding ? as a prefix, do note the current REPL tab-completion feature for listing methods that apply to a set of arguments.

julia> ?(5, 3.0, nothing)[TAB]
Slices(A::P, slicemap::SM, ax::AX) where {P, SM, AX} @ Base slicearray.jl:41
broadcast(f::Tf, As...) where Tf @ Base.Broadcast broadcast.jl:811
broadcast!(f::Tf, dest, As::Vararg{Any, N}) where {Tf, N} @ Base.Broadcast broadcast.jl:850
clamp(x::X, lo::L, hi::H) where {X, L, H} @ Base.Math math.jl:89
error(s::Vararg{Any, N}) where N @ Base error.jl:42
hvcat(nbc::Int64, as...) @ Base abstractarray.jl:2011
hvncat(dim::Int64, xs...) @ Base abstractarray.jl:2259
methods(f, t, mod::Union{Nothing, Tuple{Module}, AbstractArray{Module}}) @ Base reflection.jl:988

While related, I think the special case for Nothing or Missing issue is separable from the | for type unions issue in that they could be discussed and implemented via independent tracks.

1 Like

It seems that if folks insist on adding an infix operator for Union, it should be . The precise interpretation of the docstring can be debated, but at least the overall meaning of is “union” rather than “bitwise-or”.

1 Like

It would be good to temper expectations here — realize that there is a big difference between getting a bunch of self-selected participants to agree on a discourse thread/poll and getting a pull request merged.

12 Likes

For an infix operator for Union, my second choice would be rather than based on the Curry-Howard Isomorphism above I discussed above and Howard’s 1980 paper. I could not find as clear a precedent for .

1 Like

The closest current Julia syntax I could come up with is as follows.

julia> macro question_mark(ex)
           return Expr(:curly, :Union, ex, :Missing)
       end
@question_mark (macro with 1 method)

julia> const var"@?" = var"@question_mark"
@question_mark (macro with 1 method)

julia> @? Int8
Union{Missing, Int8}

I mean, changes like these — surface, superficial, as-little-as-1-line-of-code, but super visible — are always hard to come to consensus. Everyone has an opinion — and obviously opinions vary.

The premise behind the poll is indeed flawed — it’s definitively not necessary to implement any change here and “insisting” on a change isn’t what makes change happen. The status quo works and rules supreme. Could it be better? Perhaps, but again, opinions on what makes “better” vary. In fact, all secondary forms here have the drawback of being yet-another-thing-to-learn (and likely hard to search for!).

7 Likes

Is a PR with a formal proposal from the interested parties the right next step then? We’ve certainly reached a length where it’s evidently hard for people to read the entire thing. I’m wary of this as we are drifting into a meta-discussion about the true meaning of forum polls…

2 Likes

I don’t really understand this as a disqualifier for . There’s also an isomorphism between the union of sets and the union of types, and that isomorphism is arguably more straightforward than the one between structural logic and types, so I don’t see why that would be used to disqualify .

Maybe the isomorphism doesn’t have as fancy a name though, so that may be a downside.

1 Like