Proposed alias for union types

You wouldn’t need to implement the |(::Type,::Type) methods for custom printing, either. I still prefer printing out Union, though, the meaning is right in the name and it stands out as a delimiter. Tuple{A|B,C,D|E|F,G|H} is shorter than Tuple{Union{A,B},C,Union{D,E,F},Union{G,H}}, but my eyes can better tell where and what the elements are in the latter. Skill issue on my part, I’ll admit.

3 Likes

I admit this also looks a bit ugly to me… This is a bit better:

Tuple{(A|B),C,(D|E|F),(G|H)}

Anyways I never proposed to change the printing in Julia. I just wanted to give my :+1: to the idea of opt-in type printing for Cthulhu.jl or VSCode display since it’s more concise.

3 Likes

I could live with mandatory parentheses in printing. I’d still miss the explicit Union especially when types get really long, but distant commas aren’t any more helpful as delimiters than | in that case, so what matters is there are parentheses or braces for a text editor to match and highlight.

1 Like

Here’s my analysis to follow-up on @Tamas_Papp’s smart suggestion for doing frequency analysis.

I will preface this by stating that there are a ton of things wrong with the analysis I am doing:

  • Only inline comments removed, but not blocks
  • Strings aren’t excluded (which is why |, \ and ' are pretty high)
  • Some symbols overlap, so to fix this I literally just subtract the count of the small symbols from any matching large symbols.
  • Some symbols can be used in function names (!)
  • Union also appears as UnionAll, so I used a regexp boundary
  • I then manually remove characters from the list which are undefined (presumably used in comments?) or are just prefix/suffix operators

The “proper” way to do this would be directly on the AST rather than strings.

This list is the most common infix operators in /base and /stdlib (excluding ones with alphabetical characters, like in or isa), with Union inserted for comparison:

Symbol Count
= 79141
. 73990
: 66847
:: 38905
- 16897
! 15273
== 12940
> 11865
\ 11684
9779
* 9215
+ 7367
& 7155
&& 6421
| 4618
<: 4188
/ 4147
=== 3496
< 2719
Union 2457
% 2202
-> 2045
1821
!= 1499
+= 1242
^ 1234
<= 1199
!== 800
<< 658
>= 528
// 401
-= 358
>> 337
~ 228
191
|= 170
*= 156
140
133
|> 112
>>> 96
88
87
÷ 82
&= 71
61
59
53
>: 47
39
⊻= 36
>>= 29
29
29
26
/= 21
<<= 19
17
17
17
16
>>>= 14
14
13
12
11
11
10
10
9
7
7
^= 6

I’ll leave the interpretation to the reader…

Here’s my quick-and-dirty script:

(expand)
using DataFrames: DataFrame

symbols = Union{Regex,String}[r"\bUnion\b", "!", "!=", "!==", "::", "\$", "\$=", "%", "%=", "&", "&&", "&=", "'", "*", "*=", "+", "++", "+=", "-", "-->", "-=", "->", ".", "..", "...", ".<|", ".|>", "/", "//", "//=", "/=", ":", ":=", "<", "<--", "<-->", "<:", "<<", "<<=", "<=", "<|", "=", "==", "===", ">", ">:", ">=", ">>", ">>=", ">>>", ">>>=", "\\", "\\=", "^", "^=", "|", "|=", "|>", "||", "~", "¦", "¬", "±", "·", "×", "÷", "÷=", "·", "…", "⁝", "⅋", "←", "↑", "→", "↓", "↔", "↚", "↛", "↜", "↝", "↞", "↠", "↢", "↣", "↤", "↦", "↩", "↪", "↫", "↬", "↮", "↶", "↷", "↺", "↻", "↼", "↽", "⇀", "⇁", "⇄", "⇆", "⇇", "⇉", "⇋", "⇌", "⇍", "⇎", "⇏", "⇐", "⇒", "⇔", "⇚", "⇛", "⇜", "⇝", "⇠", "⇢", "⇴", "⇵", "⇶", "⇷", "⇸", "⇹", "⇺", "⇻", "⇼", "⇽", "⇾", "⇿", "∈", "∉", "∊", "∋", "∌", "∍", "−", "−=", "∓", "∔", "∗", "∘", "∙", "√", "∛", "∜", "∝", "∤", "∥", "∦", "∧", "∨", "∩", "∪", "∷", "∸", "∺", "∻", "∽", "∾", "≀", "≁", "≂", "≃", "≄", "≅", "≆", "≇", "≈", "≉", "≊", "≋", "≌", "≍", "≎", "≏", "≐", "≑", "≒", "≓", "≔", "≕", "≖", "≗", "≘", "≙", "≚", "≛", "≜", "≝", "≞", "≟", "≠", "≡", "≢", "≣", "≤", "≥", "≦", "≧", "≨", "≩", "≪", "≫", "≬", "≭", "≮", "≯", "≰", "≱", "≲", "≳", "≴", "≵", "≶", "≷", "≸", "≹", "≺", "≻", "≼", "≽", "≾", "≿", "⊀", "⊁", "⊂", "⊃", "⊄", "⊅", "⊆", "⊇", "⊈", "⊉", "⊊", "⊋", "⊍", "⊎", "⊏", "⊐", "⊑", "⊒", "⊓", "⊔", "⊕", "⊖", "⊗", "⊘", "⊙", "⊚", "⊛", "⊜", "⊞", "⊟", "⊠", "⊡", "⊩", "⊬", "⊮", "⊰", "⊱", "⊲", "⊳", "⊴", "⊵", "⊶", "⊷", "⊻", "⊻=", "⊼", "⊽", "⋄", "⋅", "⋆", "⋇", "⋉", "⋊", "⋋", "⋌", "⋍", "⋎", "⋏", "⋐", "⋑", "⋒", "⋓", "⋕", "⋖", "⋗", "⋘", "⋙", "⋚", "⋛", "⋮", "⋯", "⋰", "⋱", "⌿", "▷", "⟇", "⟑", "⟕", "⟖", "⟗", "⟰", "⟱", "⟵", "⟶", "⟷", "⟹", "⟺", "⟻", "⟼", "⟽", "⟾", "⟿", "⤀", "⤁", "⤂", "⤃", "⤄", "⤅", "⤆", "⤇", "⤈", "⤉", "⤊", "⤋", "⤌", "⤍", "⤎", "⤏", "⤐", "⤑", "⤒", "⤓", "⤔", "⤕", "⤖", "⤗", "⤘", "⤝", "⤞", "⤟", "⤠", "⥄", "⥅", "⥆", "⥇", "⥈", "⥉", "⥊", "⥋", "⥌", "⥍", "⥎", "⥏", "⥐", "⥑", "⥒", "⥓", "⥔", "⥕", "⥖", "⥗", "⥘", "⥙", "⥚", "⥛", "⥜", "⥝", "⥞", "⥟", "⥠", "⥡", "⥢", "⥣", "⥤", "⥥", "⥦", "⥧", "⥨", "⥩", "⥪", "⥫", "⥬", "⥭", "⥮", "⥯", "⥰", "⥷", "⥺", "⦸", "⦼", "⦾", "⦿", "⧴", "⧶", "⧷", "⧺", "⧻", "⨇", "⨈", "⨝", "⨟", "⨢", "⨣", "⨤", "⨥", "⨦", "⨧", "⨨", "⨩", "⨪", "⨫", "⨬", "⨭", "⨮", "⨰", "⨱", "⨲", "⨳", "⨴", "⨵", "⨶", "⨷", "⨸", "⨹", "⨺", "⨻", "⨼", "⨽", "⩀", "⩁", "⩂", "⩃", "⩄", "⩅", "⩊", "⩋", "⩌", "⩍", "⩎", "⩏", "⩐", "⩑", "⩒", "⩓", "⩔", "⩕", "⩖", "⩗", "⩘", "⩚", "⩛", "⩜", "⩝", "⩞", "⩟", "⩠", "⩡", "⩢", "⩣", "⩴", "⫛", "⬰", "⬱", "⬲", "⬳", "⬴", "⬵", "⬶", "⬷", "⬸", "⬹", "⬺", "⬻", "⬼", "⬽", "⬾", "⬿", "⭀", "⭁", "⭂", "⭃", "⭄", "⭇", "⭈", "⭉", "⭊", "⭋", "⭌", "←", "↑", "→", "↓"]

# Find all files recursively in ../base that end in .jl:
files = let allfiles = []
    for dir in ("base", "stdlib")
        for (root, dirs, files) in walkdir(dir)
            append!(allfiles, collect((file -> joinpath(root, file)).(files)))
        end
    end
    unique!(allfiles)
    filter!(file -> endswith(file, ".jl"), allfiles)
    String[allfiles...]
end
code = let c=""
    for file in files
        contents = split(read(file, String), '\n')
        # Remove all text after `#`:
        contents = join(map(line -> replace(line, r"#.*$" => ""), contents), '\n')
        c *= contents
    end
    c
end

# Count the number of occurrences of each symbol:
counts = let c=Dict{Union{String,Regex}, Int}()
    for symbol in symbols
        c[symbol] = count(symbol, code)
    end

    # Now, we walk through the symbols, and subtract
    # the counts of any small symbols which are
    # substrings of the current symbol:
    for s_small in symbols, s_large in symbols
        s_small == s_large && continue
        s_large isa Regex && continue
        if occursin(s_small, s_large)
            c[s_small] -= c[s_large]
        end
    end
    c
end

df = let d=DataFrame(symbols=symbols, counts=[counts[symbol] for symbol in symbols])
    sort!(d, :counts, rev=true)
    d
end

# Print to markdown table:

open("counts.txt", "w") do io
    println(io, "| Symbol | Count |")
    println(io, "|--------|-------|")
    for row in eachrow(df)
        println(io, "| $(row[:symbols]) | $(row[:counts]) |")
    end
end

I basically just took the julia-parser.scm code and did a search for those symbols, then removed overlaps. Then cleaned up as described above.

1 Like

I’m not really a fan of overloading | for types. The | operator already means bitwise-or. Even if we relax the meaning of | from bitwise-or to logical-or, the meaning is still different from set union (for types). In Julia, especially in Base Julia, a generic function should have only one generic meaning, not two.

1 Like

Using the same name for two very different meanings — here | for both bitwise and and type union — is called “punning” and is generally considered to be problematic because it means that you lose all understanding of how some output will act given an expression like output = y | z without knowing about types.

See, e.g., SciML Style Guide for Julia · SciML Style Guide for Julia : Functions mean one thing.

9 Likes

Just reuse union then: Int ∪ Float64 reads just as nice as Int | Float64, and there are no “puns”.

5 Likes

^ I will add my +1 that ∪ is also good! I would happily settle for it (but see points here)


I just want to emphasize that it is perfectly reasonable to say “I don’t like the look of it”. Some arguments against it in this thread seem to try to build some logical reason for why it we ought to avoid it. But aesthetics are a perfectly valid reason!

I think that | meaning both “bitwise or” and “this type or this type” is actually perfectly in line with this. And this is exactly why it is being adopted in many contemporary languages – | is not simply some symbol that wasn’t yet taken, it was chosen purposefully.

Maybe the missing piece for why some like it and some don’t is the following analogy:

(x ∈ A) | (x ∈ B) <=>  x ∈ (A | B) 

“x in A or x in B is equivalent to x in A or B”

This is sort of how I read | in code in other languages.

Many existing infix operators actually break this principle and mean fundamentally different things, like * meaning multiplication or string concatenation, ' meaning transpose or char, . meaning getproperty or broadcast. But | being “bitwise or” or “type union” is pretty much the same meaning but in the context of different input types.

2 Likes

I am not sure what you mean, but note that the ' in 'a' is not an operator, it is syntax.

Yes, those are parsed differently depending on where the parser encounters them:

julia> dump(quote a.b end)
Expr
  head: Symbol block
  args: Array{Any}((2,))
    1: LineNumberNode
      line: Int64 1
      file: Symbol REPL[54]
    2: Expr
      head: Symbol .
      args: Array{Any}((2,))
        1: Symbol a
        2: QuoteNode
          value: Symbol b

julia> dump(quote a .+ b end)
Expr
  head: Symbol block
  args: Array{Any}((2,))
    1: LineNumberNode
      line: Int64 1
      file: Symbol REPL[55]
    2: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol .+
        2: Symbol a
        3: Symbol b

and then the .+ is lowered into a broadcasted call.

In contrast, your proposal is an alias. It is important to keep these things distinct.

I think that the community is actively engaging with your proposal. It is disingenuous to dismiss counterarguments as merely a matter of taste.

2 Likes

My comment was not intended in this way, apologies. I literally meant that I was interested in hearing both logical arguments and aesthetic ones. I had simply noticed that there were fewer aesthetic arguments than I expected :slight_smile: (I thought aesthetics would be most of this thread). I actually found your counterargument to be very insightful and a great idea which is why I did that analysis.

8 Likes

Maybe using , or ∪̇ is a smaller ‘change’ as they are rarely used and the math semantics is more of disjoint union which is actually closer to what a Type Union represents.

( \cupdot , \uplus and \cup\dot respectively )

1 Like

TBH I’m not sure about that. The union function (where union === ∪) is for a union of iterable collections, each interpreted as a set. So the meaning is still different than with Union.

7 Likes

I mean… Keep in mind that in this thread we are mostly discussing connotations of symbols and human thoughts evoked by |, not how the literal julia parser sees it. So I would push back on this and say that the lines get a bit blurred here.

1 Like

I quite strongly disagree — programming is fundamentally about naming. Functions have names. We decide what those names are and what they mean. Syntax is just about how we (both humans and machines) parse out the independent symbols (names).

Once you have a given symbol, there is power in knowing exactly what it means. It’s not just about a | b, but also about reduce(|, xs) and methods(|). If a name means multiple things, you dilute this power.

6 Likes

Maybe it was like that originally, but it has been commonly used in a wider context for quite some time already. Definitely for much more than iterable collections!

julia> using IntervalSets

julia> (1..2) ∪ (2..3)
1 .. 3

@mbauman I think we are saying the same thing but with different language maybe? Because I actually agree with you :sweat_smile: (Were you responding to my comment or a different one?)

I think the point we disagree on is whether OR (for Bool) and type unions (for Type) are the “same thing”, but in a multiple dispatch context. I think they are the same thing… maybe because I pronounce a Union{A,B} like “A or B”.

Maybe the reason why | is being adopted as both logical-or and type union in TypeScript, PHP, Python, and Scala (both 3 | 5 == 7 and String | Int == Union{String,Int} are true in all four of them) is because of this view? I’m not sure.

1 Like

It’s true that there are many instances of punning in the ecosystem. It can be hard to completely avoid punning—the operator is probably the best choice for IntervalSets, even though it technically doesn’t conform to the generic definition in Base. Perhaps they could shadow instead of overloading Base.union.

But at least we should avoid punning in Base.

1 Like

For TypeScript and Scala, my uneducated guess is that since they are static languages, the | symbol is fundamentally different when it is in type context and value context. The same thing basically applies to Python also, since type hints are meant for static analysis.

But for Julia there is only value context.

Just for reference, Python type unions can be used for dynamic behavior on types too (and you often need to, since Python doesn’t have multiple dispatch) –

def f(x: float | int | str):
    if isinstance(x, float | int):
        return x ** 2
    elif isinstance(x, str):
        return x + " world"

f(2) # 4
f("hello") # hello world
2 Likes

Interesting!

In [1]: type(float | str)
Out[1]: types.UnionType

In [2]: 1 | 2
Out[2]: 3

But Python does not have as strong of an emphasis on generic programming as Julia does…

2 Likes