Proposed alias for union types

I do think the demands of generic programming + multiple dispatch + duck-typing make consistent naming relatively more important in Julia than in other languages. Naming is hard, though, and it can indeed be blurry, but in this case, | is firmly documented as a bitwise or, and I maintain that is a very different meaning from a type union.

There have been some very long threads here about what “meaning” itself means — here’s one decent entry-point: Function name conflict: ADL / function merging? - #136 by StefanKarpinski and the subsequent few posts.

So then these considerations get weighted alongside the aesthetic nicety of T | S over Union{T, S}. Is that really worth it? :person_shrugging:

The place where this has been discussed before is in reference to Missing and Nothing — that’s where folks have more commonly run into unions and where (I agree) a shorthand would be nice. Check out:

8 Likes

PHP is another example, btw (which, like python, also implemented this change fairly recently) –

<?php
function foo(float|int $x): float|int {
  if (is_int($x)) {
    return $x | 3;
  } else {
    return 1.0;
  }
}
echo foo(1); # 3

But yes Scala and TypeScript are static!

1 Like

Well, in your PHP example, float|int is only used in type context. Is there a PHP equivalent to the Python isinstance(x, float | int)? So far Python is the only example that allows float | int in a value context.

The argument is not “Scala and TypeScript are static languages so those examples don’t apply to Julia.” The argument is that in Scala and TypeScript the two different meanings of | are disambiguated by the context in which | occurs—either value context or type context. But, as I mentioned, Julia does not have a type context, only a value context.

Oh, no. PHP doesn’t have that. They have gettype which gives you a string version of the type of a variable. And settype for setting (taking a string as input). But I think those types aren’t values in the same way as Julia and Python. I guess that’s what you meant!

But actually TypeScript is kind of like this, have you looked at their union types at all?

Nope, I’m not too familiar with TypeScript. :slight_smile:

Pushing the logical operator connection further, TypeScript also uses & to build an “intersection” type from two concrete types — which has all the fields of both types… But that’s much less common than | for union types!

I didn’t realize @Mason had suggested this exact same syntax for Union back in 2020! Thanks for sending that link. The proposal must have gotten buried in the issue’s discussion about ? for missing.

I guess now that there’s a lot more precedent the case is hopefully stronger for such an operator (or U if you prefer — I can’t write the actual operator from my phone :sweat_smile:, one of the reasons I prefer |)

Here’s an example that emphasizes that in Julia types are just values:

julia> struct A end

julia> foo() = A
foo (generic function with 1 method)

julia> struct B
           x::foo()
       end

julia> B(A())
B(A())

julia> bar() = Float64
bar (generic function with 1 method)

julia> asdf(x::foo())::bar() = 1
asdf (generic function with 1 method)

julia> asdf(A())
1.0

julia> asdf(1)
ERROR: MethodError: no method matching asdf(::Int64)

Closest candidates are:
  asdf(::A)

I don’t really know PHP, but my guess is that you cannot replace

function foo(float|int $x): float|int { # ...

with

function foo(bar() $x): baz() { # ...

because those locations are type contexts where only type expressions are allowed. At least that would typically be the case with static languages, anyways.

2 Likes

Wait why does this aspect of the languages inform whether | is used as shorthand for unions? (Just wondering)

?union says

Construct an object containing all distinct elements from all of the arguments.

is that inconsistent with IntervalSets usage? It would be nice if each function had a formal checkable spec, but without that it’s not obvious to me that IntervalSets is outside that spec. For example I’d expect

(x in s) || (x in t) iff x in (s ∪ t)

which seems to hold here but which doesn’t hold for types – types don’t even support in or contains.

3 Likes

Because some languages have syntactic type contexts, which are syntactic locations where only type expressions can be written. In those languages the syntactic location of the | symbol disambiguates between the two possible meanings of |. In those languages, you can think of the two uses of | as corresponding to two entirely separate operators that happen to have the same name. It’s akin to having | from two separate modules, e.g. ValueContext.| and TypeContext.|.

Here’s a method definition in Scala:

def add(x: Int, y: Int): Int = x + y

The locations where the Int occurs are syntactically defined to be type contexts—only type expressions can occur in those locations. So, you cannot write this:

val a = ...
val b = ...
def add(x: a|b, y: a|b): a|b = x + y

(Well, I suspect that’s true, but I don’t actually know Scala either. :sweat_smile: ) Thus, there is no ambiguity about the meaning of | when you write this:

def foo(x: Int|String): Int|String = ...

But there is ambiguity in Julia because every syntactic location is a value context:

julia> Base.:|(S, T) = Union{S, T}

julia> Int = 1; String = 2;

julia> Int | String
3

julia> foo(x::(Int|String)) = x
ERROR: ArgumentError: invalid type for argument x in method
definition for foo at REPL[7]:1
3 Likes

Yeah, that docstring is a smidge vague. To me, the use of the word “elements” implies that the arguments are expected to be collections, but the docstring does not really clarify that.

…And given that length is listed in the manual as one of the methods to define for a general collection, that seems to imply that collections are finite. But, as has been discussed many times before, many of our interfaces are a bit hazily defined…

1 Like

What’s there to clarify (honest question)? Maybe open an issue/PR?

There’s the IteratorSize trait: Interfaces · The Julia Language

It’s not clear whether the arguments to union are expected to be collections, iterators, sets, or something else.

(The definition of a collection in Julia is rather hazy to me. It appears that the primary methods required to implement the “collection interface” are isempty and length.)

At any rate, the intervals in IntervalSets are neither iterators nor collections:

julia> iterate(1..2)
ERROR: MethodError: no method matching iterate(::ClosedInterval{Int64})

julia> length(1..2)
ERROR: MethodError: no method matching length(::ClosedInterval{Int64})
1 Like

The way I understand it, in the Manual, “collection” is used in the general (programming language/computer science) sense, without a formal Julia-specific meaning. “Iterator” is more formal because there’s an actual protocol/interface that must be conformed to (see linked Manual page).

Perhaps it would be good to make the terminology use in the Manual more consistent and a bit more formal.

As explained on the Interfaces page of the Manual, the most important function to overload is iterate. All others are optional in general.

1 Like

Of course I have read the Interfaces page of the manual. I was not referring to the iterator interface, I was referring to this section of the manual, which does imply that isempty and length are the two main methods required for something to be called a “general collection”. According to that page, the “general collection” interface is “fully implemented by”

There are other ill-defined interfaces on that page, like “Dictionaries” and “Set-Like Collections”.

But perhaps I’m reading too much into that page and they should not be considered “interfaces”. :joy:

Perhaps to avoid ambiguity while preserving similarities with Python and other languages which use | as a shorthand for union types, we could cheat a little. There is a similar symbol, \mid, which is unused; thus, all relevant precedence could be done with it. The difference is noticeable enough, and A|B vs A∣B so users would not confuse those.

I personally don’t like tab-completed characters in general, but I think it would be a particular bad idea to use one for something that would be typed often and in succession, like (A|B|C|D|E). Sure, custom key bindings and copy-pasting can speed it up a bit, but I just see that as a complication. There’s a reason standard keyboard characters like * is used for multiplication instead of the far more commonly written × and ⋅; an aside, there is a Julia blogpost about this.

3 Likes

Union type is generally something that is mostly used by package developers. I often use it with Union{MyType, Nothing}. In this regard, slightly more hidden characters can be allowed.

A nice thing about \mid is that it is already is kind of on every keyboard. We already use the keyboard shortcut Shift + \ when typing |. Why not simply have a shortcut for a slightly different version, Option + \ which types ? Easy to remember as it is just there.

Introduction

I dug into the topic further to try to find its origins. I was not quite satisfied with the Python discussion that mostly pointed at Scala, but I thought that would make a good starting point. I ended up seeing the origins of this from the Curry-Howard Isomorphism suggesting that logical or is indeed the historical operator.

Scala 3

I was trying to read into the early origins of the use of | in Scala 3. One important note is that in Scala 3 they also have intersection types, which uses & as an operator.

TypeScript

Note that TypeScript similarly used | and & for Union and Intersection types.

Scala.js

Some of the precedent seems to go an earlier implementation in scala.js:

There the original operator proposed was \/.

Curry-Howard Isomorphism (correspondence)

Reading further back, I found this blog post by Miles Sabin in 2011:

https://milessabin.com/blog/2011/06/09/scala-union-types-curry-howard/

This in turn refers to the Curry-Howard isomorpism relating type theory and structural logic.

https://en.m.wikibooks.org/wiki/Haskell/The_Curry–Howard_isomorphism

There the operators are and , \vee and \wedge respectively. is the logical disjunction operator.

Note that logical disjunction in terms of boolean algebra is what we usually use | for now.

See W. A. Howard. The Formulae-as-Types Notion of Construction. 1969, 1980.

If we keep reading, I think we will see some origin of using | in ML, particularly Standard ML.

Summary

The use of | for type unions originates from the following.

  • The Curry-Howard Isomorphism relates type theory with structural logic. Through the isomorpism a type union corresponds with logical or.
  • In programming languages, we usually have written as |.

Several here have proposed using , the set theory union. Note the visual similarity to . Apparently this is not a coincidence. However, Curry-Howard refers to logic not sets.

Above there are arguments that | is the bitwise or as currently documented. I think that definition only makes sense for bitstype instances. Rather the correct definition should be | refers to “logical or” or “logical disjunction”.

I see the points about sum types, tagged unions, but I do not currently see that becoming a fundamental part of the language in Julia 1 beyond its implementation in a package such as SumTypes.jl.

In light of this background, I am more in favor of using | for this purpose than before I started digging into the topic.

15 Likes