Proposed alias for union types

mbauman · January 2, 2024, 5:11pm

I do think the demands of generic programming + multiple dispatch + duck-typing make consistent naming relatively more important in Julia than in other languages. Naming is hard, though, and it can indeed be blurry, but in this case, | is firmly documented as a bitwise or, and I maintain that is a very different meaning from a type union.

There have been some very long threads here about what “meaning” itself means — here’s one decent entry-point: Function name conflict: ADL / function merging? - #136 by StefanKarpinski and the subsequent few posts.

So then these considerations get weighted alongside the aesthetic nicety of T | S over Union{T, S}. Is that really worth it?

The place where this has been discussed before is in reference to Missing and Nothing — that’s where folks have more commonly run into unions and where (I agree) a shorthand would be nice. Check out:

MilesCranmer · January 2, 2024, 5:24pm

PHP is another example, btw (which, like python, also implemented this change fairly recently) –

<?php
function foo(float|int $x): float|int {
  if (is_int($x)) {
    return $x | 3;
  } else {
    return 1.0;
  }
}
echo foo(1); # 3

But yes Scala and TypeScript are static!

CameronBieganek · January 2, 2024, 5:36pm

Well, in your PHP example, float|int is only used in type context. Is there a PHP equivalent to the Python isinstance(x, float | int)? So far Python is the only example that allows float | int in a value context.

The argument is not “Scala and TypeScript are static languages so those examples don’t apply to Julia.” The argument is that in Scala and TypeScript the two different meanings of | are disambiguated by the context in which | occurs—either value context or type context. But, as I mentioned, Julia does not have a type context, only a value context.

MilesCranmer · January 2, 2024, 5:44pm

Oh, no. PHP doesn’t have that. They have gettype which gives you a string version of the type of a variable. And settype for setting (taking a string as input). But I think those types aren’t values in the same way as Julia and Python. I guess that’s what you meant!

But actually TypeScript is kind of like this, have you looked at their union types at all?

CameronBieganek · January 2, 2024, 5:47pm

Nope, I’m not too familiar with TypeScript.

MilesCranmer · January 2, 2024, 5:51pm

Pushing the logical operator connection further, TypeScript also uses & to build an “intersection” type from two concrete types — which has all the fields of both types… But that’s much less common than | for union types!

MilesCranmer · January 2, 2024, 5:58pm

I didn’t realize @Mason had suggested this exact same syntax for Union back in 2020! Thanks for sending that link. The proposal must have gotten buried in the issue’s discussion about ? for missing.

I guess now that there’s a lot more precedent the case is hopefully stronger for such an operator (or U if you prefer — I can’t write the actual operator from my phone , one of the reasons I prefer |)

CameronBieganek · January 2, 2024, 6:04pm

Here’s an example that emphasizes that in Julia types are just values:

julia> struct A end

julia> foo() = A
foo (generic function with 1 method)

julia> struct B
           x::foo()
       end

julia> B(A())
B(A())

julia> bar() = Float64
bar (generic function with 1 method)

julia> asdf(x::foo())::bar() = 1
asdf (generic function with 1 method)

julia> asdf(A())
1.0

julia> asdf(1)
ERROR: MethodError: no method matching asdf(::Int64)

Closest candidates are:
  asdf(::A)

I don’t really know PHP, but my guess is that you cannot replace

function foo(float|int $x): float|int { # ...

with

function foo(bar() $x): baz() { # ...

because those locations are type contexts where only type expressions are allowed. At least that would typically be the case with static languages, anyways.

MilesCranmer · January 2, 2024, 6:17pm

Wait why does this aspect of the languages inform whether | is used as shorthand for unions? (Just wondering)

jar1 · January 2, 2024, 7:24pm

?union says

Construct an object containing all distinct elements from all of the arguments.

is that inconsistent with IntervalSets usage? It would be nice if each function had a formal checkable spec, but without that it’s not obvious to me that IntervalSets is outside that spec. For example I’d expect

(x in s) || (x in t) iff x in (s ∪ t)

which seems to hold here but which doesn’t hold for types – types don’t even support in or contains.

CameronBieganek · January 2, 2024, 7:38pm

Because some languages have syntactic type contexts, which are syntactic locations where only type expressions can be written. In those languages the syntactic location of the | symbol disambiguates between the two possible meanings of |. In those languages, you can think of the two uses of | as corresponding to two entirely separate operators that happen to have the same name. It’s akin to having | from two separate modules, e.g. ValueContext.| and TypeContext.|.

Here’s a method definition in Scala:

def add(x: Int, y: Int): Int = x + y

The locations where the Int occurs are syntactically defined to be type contexts—only type expressions can occur in those locations. So, you cannot write this:

val a = ...
val b = ...
def add(x: a|b, y: a|b): a|b = x + y

(Well, I suspect that’s true, but I don’t actually know Scala either. ) Thus, there is no ambiguity about the meaning of | when you write this:

def foo(x: Int|String): Int|String = ...

But there is ambiguity in Julia because every syntactic location is a value context:

julia> Base.:|(S, T) = Union{S, T}

julia> Int = 1; String = 2;

julia> Int | String
3

julia> foo(x::(Int|String)) = x
ERROR: ArgumentError: invalid type for argument x in method
definition for foo at REPL[7]:1

CameronBieganek · January 2, 2024, 7:42pm

Yeah, that docstring is a smidge vague. To me, the use of the word “elements” implies that the arguments are expected to be collections, but the docstring does not really clarify that.

…And given that length is listed in the manual as one of the methods to define for a general collection, that seems to imply that collections are finite. But, as has been discussed many times before, many of our interfaces are a bit hazily defined…

nsajko · January 2, 2024, 8:26pm

What’s there to clarify (honest question)? Maybe open an issue/PR?

There’s the IteratorSize trait: Interfaces · The Julia Language

CameronBieganek · January 2, 2024, 9:56pm

It’s not clear whether the arguments to union are expected to be collections, iterators, sets, or something else.

(The definition of a collection in Julia is rather hazy to me. It appears that the primary methods required to implement the “collection interface” are isempty and length.)

At any rate, the intervals in IntervalSets are neither iterators nor collections:

julia> iterate(1..2)
ERROR: MethodError: no method matching iterate(::ClosedInterval{Int64})

julia> length(1..2)
ERROR: MethodError: no method matching length(::ClosedInterval{Int64})

nsajko · January 2, 2024, 10:05pm

The way I understand it, in the Manual, “collection” is used in the general (programming language/computer science) sense, without a formal Julia-specific meaning. “Iterator” is more formal because there’s an actual protocol/interface that must be conformed to (see linked Manual page).

Perhaps it would be good to make the terminology use in the Manual more consistent and a bit more formal.

As explained on the Interfaces page of the Manual, the most important function to overload is iterate. All others are optional in general.

CameronBieganek · January 2, 2024, 10:12pm

Of course I have read the Interfaces page of the manual. I was not referring to the iterator interface, I was referring to this section of the manual, which does imply that isempty and length are the two main methods required for something to be called a “general collection”. According to that page, the “general collection” interface is “fully implemented by”

AbstractRange

UnitRange

Tuple

Number

AbstractArray

BitSet

IdDict

Dict

WeakKeyDict

AbstractString

Set

NamedTuple

There are other ill-defined interfaces on that page, like “Dictionaries” and “Set-Like Collections”.

But perhaps I’m reading too much into that page and they should not be considered “interfaces”.

Janis_Erdmanis · January 2, 2024, 11:24pm

Perhaps to avoid ambiguity while preserving similarities with Python and other languages which use | as a shorthand for union types, we could cheat a little. There is a similar symbol, \mid, which is unused; thus, all relevant precedence could be done with it. The difference is noticeable enough, and A|B vs A∣B so users would not confuse those.

Benny · January 2, 2024, 11:42pm

I personally don’t like tab-completed characters in general, but I think it would be a particular bad idea to use one for something that would be typed often and in succession, like (A|B|C|D|E). Sure, custom key bindings and copy-pasting can speed it up a bit, but I just see that as a complication. There’s a reason standard keyboard characters like * is used for multiplication instead of the far more commonly written × and ⋅; an aside, there is a Julia blogpost about this.

Janis_Erdmanis · January 3, 2024, 1:02am

Union type is generally something that is mostly used by package developers. I often use it with Union{MyType, Nothing}. In this regard, slightly more hidden characters can be allowed.

A nice thing about \mid is that it is already is kind of on every keyboard. We already use the keyboard shortcut Shift + \ when typing |. Why not simply have a shortcut for a slightly different version, Option + \ which types ∣? Easy to remember as it is just there.

mkitti · January 3, 2024, 2:34am

Introduction

I dug into the topic further to try to find its origins. I was not quite satisfied with the Python discussion that mostly pointed at Scala, but I thought that would make a good starting point. I ended up seeing the origins of this from the Curry-Howard Isomorphism suggesting that logical or is indeed the historical operator.

Scala 3

I was trying to read into the early origins of the use of | in Scala 3. One important note is that in Scala 3 they also have intersection types, which uses & as an operator.

TypeScript

Note that TypeScript similarly used | and & for Union and Intersection types.

Scala.js

Some of the precedent seems to go an earlier implementation in scala.js:

github.com/scala-js/scala-js

js.| union type

opened 04:44PM - 24 Jul 15 UTC

closed 08:41PM - 28 Jul 15 UTC

sjrd

enhancement

Edit: the original proposal was to name it `\/`. It would be worth thinking abo…ut a union type constructor for facade types. Similar to `js.UndefOr[+A]` in its construction, `js.|[+A, +B]`, or whatever it's called, would represent a value of type `A` or `B`. It would have zero API, though. The basic idea is the following: ``` scala @RawJSType // don't do this at home sealed trait |[+A, +B] object | { implicit def fromA[A](a: A): A | Nothing = a.asInstanceOf[A | Nothing] implicit def fromB[B](b: B): Nothing | B = b.asInstanceOf[Nothing | B] } ``` Then of course we need ways to transitively convert an `A`, a `B`, or a `C` to an `A | B | C`. This probably needs a bit of recursive implicit magic. Even trickier: convert `A | B`, `A | C` and `B | C` to `A | B | C` ^^

There the original operator proposed was \/.

Curry-Howard Isomorphism (correspondence)

Reading further back, I found this blog post by Miles Sabin in 2011:

https://milessabin.com/blog/2011/06/09/scala-union-types-curry-howard/

This in turn refers to the Curry-Howard isomorpism relating type theory and structural logic.

https://en.m.wikibooks.org/wiki/Haskell/The_Curry–Howard_isomorphism

There the operators are ∨ and ∧, \vee and \wedge respectively. ∨ is the logical disjunction operator.

Note that logical disjunction in terms of boolean algebra is what we usually use | for now.

See W. A. Howard. The Formulae-as-Types Notion of Construction. 1969, 1980.

If we keep reading, I think we will see some origin of using | in ML, particularly Standard ML.

Summary

The use of | for type unions originates from the following.

The Curry-Howard Isomorphism relates type theory with structural logic. Through the isomorpism a type union corresponds with logical or.
In programming languages, we usually have written ∨ as |.

Several here have proposed using ∪, the set theory union. Note the visual similarity to ∨. Apparently this is not a coincidence. However, Curry-Howard refers to logic not sets.

Above there are arguments that | is the bitwise or as currently documented. I think that definition only makes sense for bitstype instances. Rather the correct definition should be | refers to “logical or” or “logical disjunction”.

I see the points about sum types, tagged unions, but I do not currently see that becoming a fundamental part of the language in Julia 1 beyond its implementation in a package such as SumTypes.jl.

In light of this background, I am more in favor of using | for this purpose than before I started digging into the topic.

Topic		Replies	Views
Syntactic sugar for types General Usage	8	257	September 22, 2024
Aliases for Union{T, Nothing} and Union{T, Missing}? New to Julia	40	7276	May 10, 2019
Union type constructor question General Usage	3	105	June 10, 2024
Idea: shortcut notation for Union{T, Nothing} General Usage	2	459	November 21, 2024
What happened to `T?` Internals question	19	1764	June 21, 2019