Proposed alias for union types

The clearest pull request might be the new method for Base.:| along with tests. Another route would be a Github Discussion RFC for a technical discussion.

What I think we should try to converge on here is what would be the best technical definition to put forth IF we were to try to move forward along with a coherent motivation.

Miles has changed the definition a few times, so I think it might be worth discussing the implementation.

  1. Base.:|(t::Type...) = Union{t...} # Original
  2. Base.:|(t::Type, types::Types...) = Union{t, types...} by Mark Kittisopikul
  3. Base.:|(::Type{A}, ::Type{B}) where {A,B} = Union{A, B} by Mason Protter, current

Most of the work is still done by Union{...}. My thought here is that we should minimize additional compilation and despecialize. This is the opposite of the other related discussion on specialization.

  1. @inline |(@nospecialize(t::Type), @nospecialize(ts::Type...)) = Union{t, ts...}

It would be good to work out as much as possible here before taking another step.

Whether that pull request would go anywhere is a whole other matter. It might be useful to have a more technical discussion there and perhaps get some comments from triage.

What would convince me is if you could find some precedent where is used in the context of type theory. My first choice here would be the following precedent in programming languages. My second choice is to follow precedent from the mathematical notation used in type theory. What I do not want to do is make up a new notation in a vacuum.

2 Likes

Pull requests are indeed how changes happen, but for controversial things it can be helpful to gather consensus in an issue or discourse topic like this one to get a sense of their likelihood of acceptance before spending time on such a thing. Personally, I don’t think an infix alias is gonna happen (but I’m just one voice!), just in my own weighting of the pros and cons:

Pros:

  • Yeah, (Int | Missing) looks slightly nicer than Union{Int, Missing}.
  • Using | would mirror other languages.

Cons:

  • There is yet another way to do (and learn) it.
  • The obvious operator from other languages — | — definitively does not match its existing use as bitwise or in Julia.
  • Using the union function — — requires unicode for infix, making things slightly harder to write. I think it’s also slightly confusing that you could now write either union(T, S) or Union{T, S}.
  • Exporting a new name — like or — isn’t without its downsides either; it’s also unicode, visually similar-but-different, and the new export itself can cause a bit of trouble as a good number of packages already define themselves. It’s not clear to me how many would be compatibly-defined enough to merge into the newly-exported base functionality. It’s not major, but it’s worth weighing.

So: Do the pros outweigh the cons? I’m with Guido (nice find, @mkitti) — I don’t think infix is a big enough improvement to be worth it. That’s why I do like the overridable post-fix operator idea — but @Mason is spot on with the fiddliness of T? a : b. Like I said, change needn’t happen :slight_smile:

10 Likes

Exporting a new name — like or — isn’t without its downsides either; it’s also unicode, visually similar-but-different…

Note that at the moment, it is not possible to define a function with \mid as it seems not to be included by the parser. Thus, at the moment, one can set its precedence without any worries about breaking existing code. I personally would use it often if I could write down a struct with optional arguments as:

struct MyType
    x::Int
    y::Int∣Nothing
end

Alternativelly, if I had to choose typing Union{Int,Nothing} vs (Int|Nothing), I would choose the former as | does not make typing much easier and the former syntax is more established.

Ah… consensus on discourse… quite a high bar :sweat_smile:

FWIW I think this was a misreading, because by this:

let’s see what we can do with the existing operators

Guido meant using the existing operator, | – because it had already existed for 1 | 2 == 3. (Which is why they have indeed gone through and implemented it…)

I’m just saying it can be helpful to take the temperature to get a sense of likelihood of acceptance — but that can indeed sometimes be hard to see here. Many of the folks you’d really need to convince aren’t participating in this thread.

I don’t see that at all. He’s saying that using ?T or T? requires changing Python’s syntax rules (just as it would in Julia). So he moves on to considering two candidates that parse as existing operators: ~ and |.

I think ~T looks okay. I’m currently working on annotating a very large codebase, and Optional[T] is so frequent that I think T | None would not be enough of an improvement.

Emphasis mine. It’s always all about tradeoffs. They ended up deciding that it was worth it, sure, but their tradeoffs are different than ours. I wonder if he wrote about what convinced him (if he did get convinced)…

4 Likes

Changing the language syntax is controversial and has all this compatibility things to consider and all that. But, just for personal testing, I think people can change an unimportant bit only, the output format of the types. Input still lengthy with Union and all and compatible etc (and this is not a recommendation, as it breaks the iron-rule of output pastability).
To try it out, I’ve chosen a syntax as follows:

  • Union{A, B} will be (A | B)
  • Union{A, Nothing} will be Aᵒ (this is \^o)
  • Union{A, Missing} will be A?

With that, one example in the thread looks like:

julia> typeof(((a=[1f0, missing], b=[missing, "hello"], c=[1, missing])))
@NamedTuple{a::Vector{Float32?}, b::Vector{String?}, c::Vector{Int64?}}

julia> Union{Int, Nothing}
Int64ᵒ

To get this change:

julia> import Base: _show_type

julia> function _show_type(io::IO, @nospecialize(x::Type))
           if Base.print_without_params(x)
               Base.show_type_name(io, (Base.unwrap_unionall(x)::DataType).name)
               return
           elseif get(io, :compact, true)::Bool && Base.show_typealias(io, x)
               return
           elseif x isa DataType
               Base.show_datatype(io, x)
               return
           elseif x isa Union
               if get(io, :compact, true)::Bool && Base.show_unionaliases(io, x)
                   return
               end
               uts = Base.uniontypes(x)
               hasnothing = false
               hasmissing = false
               if length(uts)>1
                   pos = findfirst(==(Nothing), uts)
                   if !isnothing(pos)
                       hasnothing = true
                       deleteat!(uts, pos)
                   end
               end
               if length(uts)>1
                   pos = findfirst(==(Missing), uts)
                   if !isnothing(pos)
                       hasmissing = true
                       deleteat!(uts, pos)
                   end
               end
               if length(uts)>1
                   print(io, "")
                   Base.show_delim_array(io, uts, '(', " |", ')', false)
               else
                   print(io, uts[1])
               end
               hasmissing && print(io, "?")
               hasnothing && print(io, "ᵒ")
               return
           end

           x = x::UnionAll
           wheres = TypeVar[]
           let io = IOContext(io)
               while x isa UnionAll
                   var = x.var
                   if var.name === :_ || Base.io_has_tvar_name(io, var.name, x)
                       counter = 1
                       while true
                           newname = Symbol(var.name, counter)
                           if !Base.io_has_tvar_name(io, newname, x)
                               var = TypeVar(newname, var.lb, var.ub)
                               x = x{var}
                               break
                           end
                           counter += 1
                       end
                   else
                       x = x.body
                   end
                   push!(wheres, var)
                   io = Base.IOContext(io, :unionall_env => var)
               end
               if x isa DataType
                   Base.show_datatype(io, x, wheres)
               else
                   show(io, x)
               end
           end
           Base.show_wheres(io, wheres)
       end
_show_type (generic function with 1 method)

This is a patch, of course, to get the idea for a bit. It even does both Missing and Nothing:

julia> Union{String, Missing, Nothing}
String?ᵒ

julia> Union{Int, String, Missing, Nothing}
(Int64 | String)?ᵒ

I’m not endorsing any change at all (I’m at the status-quo is fine also camp). But I do dislike too long printouts.

2 Likes

This is why I wondered if we should just make a PR about it so we can get the attention of the off-discourse folks.

Apart from a few who are in strong opposition, I did sense an overall welcoming towards the idea (or a close variant) from a majority of the (self-selected) people in this thread. But I wonder if the majority of Julia veterans (who would have the most say) would also be in opposition or not.

1 Like

Usually (with rare exceptions) we want the output format of show (hence repr) to be valid Julia syntax, i.e. so that eval(Meta.parse(repr(foo))) == foo (in global scope).

4 Likes

Perhaps another tip on the pull request is what NOT to do. The ? issue was not very good. You do not want Jeff Benzanson to take a look and be grumpy about what he sees.

If you have not already, take a look at julia/CONTRIBUTING.md at master · JuliaLang/julia · GitHub

When you submit a PR to the main repository, the purpose is not to get the attention of the off-discourse folks. If that’s what you want to do, use the Github Discussion route.

A good PR to julialang/julia is necessary, clear, concise, and actionable. The less discussion that is needed there the better.

2 Likes

Sure. That’s why the rest of the sentence surrounding the quote went (emphasis added):

Regarding this comment:

Just curious – would this actually be an issue? If a package is defining but not overloading Base.:(∨), then it wouldn’t actually be type piracy, right?

This is similar to how I could potentially do:

module MyPkg

|(::Type{A}, ::Type{B}) where {A,B} = Union{A,B}

function f(x::Vector{Float64|Missing})
    return sum(xi -> ismissing(xi) ? 0.0 : xi, x)
end

end

Which means it’s a different symbol than Base:

julia> MyPkg.:(|) == Base.:(|)
false

Julia has many infix operators already – Proposed alias for union types - #44 by MilesCranmer. Some I didn’t even know about. I’m surprised there are exports for even uncommon symbols like ⊈ and ⊽.

By the way, @mbauman, in doing some self-analysis, maybe one of the other reasons | feels so natural to me (and perhaps others) is because of its behavior in regexp. In particular the (A|B) is a capture group that can capture a match for A or a match for B.

For example –

julia> occursin(r"(A|B)", "A")
true

which is why this syntax looks so intuitive (with abstract type A end; abstract type B end;) –

julia> (A|B) >: A
true
3 Likes

It’s technically not considered a breaking change, but it does cause trouble, because scripts like this stop working:

using SomePackage
1 ∨ 2

Here’s a MWE:

julia> module MyBase
       ∨(x, y) = 10
       export ∨
       end
Main.MyBase

julia> module SomePackage
       ∨(x, y) = 20
       export ∨
       end
Main.SomePackage

julia> using .MyBase

julia> using .SomePackage

julia> 1 ∨ 2
WARNING: both SomePackage and MyBase export "∨"; uses of it in
module Main must be qualified
ERROR: UndefVarError: `∨` not defined
1 Like

I have been thinking about this, and I’m not sure if it accurate to say there is an isomorpism between the union of sets and the union of types.

An isomorpism requires both a relationship and a structure to be preserved through the relationship.

Let’s say there are sets, sA and sB , and types, tA and tB, such that there is a bijective mapping Ω between sA ∪ sB and tA ∨ tB. There are elements belong to the set union and type instances of the type union. If there is an isomorphism, then I should be able take an arbitrarily subset of sA ∪ sB and bijectively map it through Ω to some collection of instances that isa tA ∨ tB such that all subsets map to all possible collections of instances that have the relationship isa tA ∨ tB.

Now construct new instances that have the relationship isa tA ∨ tB. They may have an equivalence relation to existing instances of the type union, but they are not the same. Through the bijective mapping Ω I should be able find corresponding subset of the set union, but that set union already maps through Ω to other elements of the type union. Thus, the mapping and the isomorphism cannot exist. Proof by contradiction.

The source of the contradiction is a fundamental difference between the structure of sets and types. Sets are extensional consisting of elements which belong to the set, while types are intensional with members that can be constructed to be of certain types [1].

Certain types, value types, are structurally similar to sets. Bool consists of true and false, but this is not the case for all types.

Whether an instance is of a certain type can be mapped to a truth statement and whether an instance is of a member of a type union can be determined by the logical disjunction of the truth statements of the type members of the union.

In general, types are not sets. Thus, we should not assume to use set operators on types.

I am not claiming to understand this completely. I’m just noticing that some people do seem to be very particular about this matter.

[1] Types versus sets in math and programming languages | blog :: Brent -> [String]
[2] James H Morris. Types are not Sets. ACM-SIGACT Symposium on Principles of Programming Languages. https://dl.acm.org/doi/pdf/10.1145/512927.512938

Edit: Swapped the terms extensional and intentional to match the blog post. Thanks @Benny

1 Like

Isn’t there a proposal mechanism similar to PEPs in the Python community? It seems too hasty to me to create a PR based on a thread in this forum without further discussion.

Technically there is GitHub - JuliaLang/Juleps: Julia Enhancement Proposals but that is not really in use…

1 Like

That is the converse of what is written in the cited blogpost:

The slogan is that types are intensional , whereas sets are extensional .

Now, the author of that blog has math/CS degrees, while I have zero, but I’m not sure if this is true. Sets can be defined intensionally, especially convenient for infinite sets where we can’t just list all elements. And does it make sense to consider all types intensional? Concrete types, certainly. Parametric types like Complex maybe, the construction just depends on another concrete type. But Number itself doesn’t say anything about its instances, you just attach subtypes to it, and some of them will implement constructors. Singleton types and Enum seems properly extensional, you just specify all instances at definition and can retrieve them with instances.

For example, 2 ∈ (ℕ \ {3,5}) should evaluate to true , but we obviously can’t just enumerate all the elements of ℕ \ {3,5} since there are infinitely many.

There’s this other quote, but this actually seems feasible without enumeration: (2∈ℕ)∧(2∉{3,5}) = true∧true = true?

Granted, Julia types are indeed very different from sets:

  • An element can belong to arbitrary sets, an instance belonging to 1 concrete type cannot have arbitrary types. For example, the number 2 belongs to both the set of integers and the set of even integers, but a subtype Even<:Int is not possible, let alone 2::Even.
  • A type only has 1 immediate/parent supertype, so its other supertypes are related through a linear chain of supertyping. On the other hand, a set has supersets that are not supersets of each other.
  • Concrete types can’t subtype each other, and abstract subtyping does not reflect sets for sensible dispatch. For example, Real does not subtype Complex{T<:Real}.
  • In general, the set of possible outputs of a particular call signature is not reflected in the type system, even for constructors. For example, I could define struct Even to wrap Int and the inner constructor of course only instantiates even inputs. I can (very recently) x = reinterpret(Even, 3) to get around the constructor’s restriction, and x isa Even evaluates true just fine.
  • Almost goes without saying at this point, but types have no analog to arbitrary set definitions and operations.

Just my 2c here. I like the (A|B) syntax and I see me using it if available. The alternative with unicode characters I’ll never use. Having a unicode character in the code is not worth the trouble.

9 Likes

Well, considering sets extensionally makes a very strong statement: Two definitions create the same set if they happen to have the same elements (“…in our model universe”).

Types are intensional, in the sense that: Two definitions create the same type if there is a proof (…in our specific proof system) that they are the same.

These are conceptually and philosophically very different things! Not all true statements have proofs (under most interpretations of “true”, “statement” and “proof”).

Maybe you want same-ness / equivalence of types to be decidable? (…ok, not the case in julia). Strong argument for intensional types.

Maybe you don’t like to deal with counter-intuitive monstrosities that mathematicians came up with in the set-theoretic view? Hello Grundlagendebatte, constructivism, intuitionism.

Both set-theoretic and type-theoretic foundations have their place and are useful. Conflating the notation is good most of the time, but really really bites you if you need to talk specifically about weird corner cases.

Since julia is a programming language, not the standard mathematical notation, it is probably OK to conflate / pun the notation. (saying “it is OK” doesn’t mean that it is necessarily a good idea – I think it is a good idea, but reasonable people may think differently)

These don’t seem to be true?

  • A set has an extensional definition and many possible intensional definitions, and it remains true that a set is identified by extension no matter which definition is used.
  • Assuming for the moment that you could describe a type definition as extensional or intensional, @enum Fruit apple orange kiwi is clearly extensional.
  • Identical definitions making the same thing isn’t related to intensionality at all. Identical extensional definitions obviously make the same set, and different intensional definitions can make the same set. If different type definitions can’t make the same type, then maybe the analogy to sets is too poor to call them intensional.