It’s been 11 years since the release of Julia 0.1 — my point is that this being the first time someone has even voiced a desire for a more concise Union
syntax is not exactly evidence of a burning need.
Yes, but the issue has only arisen in the last few years because of its recent rising popularity, particularly in TypeScript and Python. Precedent did not exist 11 years ago. It very much does now.
It’s been 11 years since the release of Julia 0.1 — my point is that this being the first time someone has even voiced a desire for a more concise
Union
syntax is not exactly evidence of a burning need.
I can confess that the only reason I considered this idea at all (and thus why I started this thread) was because I started noticing this syntax more and more frequently in Python libraries.
It doesn’t seem unusual to me that it hasn’t been thought of until now… that’s just how languages evolve, by cross-breeding and mutating ideas in a larger ecosystem of other languages. And this syntax has only recently started becoming a dominant species.
I’ve actually seen this brought up on Slack and Discourse, but the syntax itself doesn’t add any new capabilities so someone would have to make a PR and bikeshed with the understanding that there’s a very real possibility that the proposal is not accepted.
There is a conflicting usage. In Python T1 | T2
is a union type. In functional languages, T1 | T2
is a sum type (aka tagged union), not a union type.
Julia users have been exploring sum types eg in SumTypes.jl. These tend to be especially useful in combination with pattern matching.
A tagged union is pretty similar to a sum type. In Julia the main difference would be giving each union member a unique name identifier
It’s consequential because, for one thing, when pattern matching on one member of a sum type, you can statically check that you’re not missing any branches. You can’t do that with a value that doesn’t know which other values are possible.
But I think this would always be inlined to
Union{t...}
, if not evaluated at compile time
I don’t think there is a guarantee but @inline
might help? In any case, |
chains lower to a sequence of binary operations so you’d have to write out or append that expression for benchmarks. The earlier talk about functions calling (|)(...)
calling Union{...}
should be looked into but is ultimately an aside.
to force type specialization for the binary version?
I could see this quickly blowing up the number of methods though.
Yeah that wouldn’t be good. I wonder, how does the underlying variadic function in Core.apply_type
work without having to contend with specialization/recompilation. I tried reading up on variadic functions in C but couldn’t find a simple straight answer, and the definitions in builtins.c seem to be macros that I just couldn’t do in my head.
Arguably, PSA: Julia is not at that stage of development anymore applies here
Still possible if it’s not breaking, but yeah I don’t see it happening if it doesn’t add some capability or improve things a lot for people. Feature bloat isn’t desirable.
It’s not a breaking change, just a new method. Since
|
currently throws an error on::Type
input, this would not affect existing code.
Existing code may handle the call assuming the signature is not implemented, so undoing that is still considered type piracy. That doesn’t mean you can’t ever do it; maybe you want that change to fix bugs, or that call signature may really be never used so the change is isolated. You would have to test thoroughly to look for things going out of control and be open to bug reports. Involving new types is the only way to easily prove the call signatures only affect new code.
I can confess that the only reason I considered this idea at all (and thus why I started this thread) was because I started noticing this syntax more and more frequently in Python libraries.
I’d rather not need to learn cosmetic features just because some other language started doing it. I had once actually looked for a Python feature before, specifically the walrus operator letting you assign and use a variable within a conditional statement and other one-liners. I learned to ask “can this be done easily already” real fast when someone pointed out that not only could parenthesized assignment already do this, you can make multiple-line conditional statements in let
and begin
blocks with a boolean value in the last line.
@jar1 I don’t think this would necessarily be incompatible with sum types because you would need a different syntax to describe the tagging anyways. In other words, Base.:|(::Vararg{Type})
is not enough information to create a sum type.
If Base.:|
is defined in this way in Base
, then SumTypes could look at overloading a Base.:|(::Vararg{SumTypes.Tagged})
. Which IMO is a more palatable syntax if Int | Float64
starts being allowed for untagged/normal unions. So basically we are on the same team here
Although since SumTypes
requires creating new structs which are not actually Union
, you would still need a macro to pull this off… In any case, SumTypes is an external package not in Base, which is also still in an experimental stage (I use it in one of my packages; it’s great!) so not sure how practical it is to discuss here.
@Benny you have mentioned type piracy a few times but I am not sure why. Just to be clear, the idea we are discussing is that Base.:|(::Vararg{Type})
could be added to Base
itself. This is not type piracy. Only if I defined that method myself in one of my own packages would it be type piracy.
Base.:|(::Vararg{Type})
could be added toBase
itself. This is not type piracy. Only if I defined that method myself in one of my own packages would it be type piracy.
Correct, except a pirating method could be defined from any module, not just packages. The point was that you couldn’t assume implementing an unimplemented call signature would not affect existing code because you’re doing the same changes included in type piracy. The distinction from type piracy is that by contributing to Base
directly, you have the responsibility to find and handle any bugs that arise from the change. The principle of new types being the only easy proof of no effects on existing code applies whether you’re pirating or contributing.
FWIW, before union types we should discuss unions of sets.
Currently |
does not compute the union of sets, and &
does not produce its intersection.
I believe that |
for union types comes from unions of sets, under the interpretation that a type represents a set and isa
represents elementOf
.
I believe that |
for unions of sets comes from the duality between predicates and sets, i.e. {x: p(x)} \cup {x: q(x)} = {x: p(x) | q(x)}
; given that we were restricted to ascii / typewriter back when this convention was made, it makes sense. Also meshes nicely with bitwise-or and bags-of-flags.
If we want to continue that grand tradition of conflating logical operations on predicates with set-ops on their extents, then we should also do that for sets, not just for types.
I personally like the Int32 | UInt32
notation a lot.
But the current Union{...}
is not overly clunk and my capacity for syntactic annoyances is fully used up by the lack of ascii-infix-xor (a multi-letter one would suffice, like /xor/
or something; just needs to be parsed infix)
Ideally only the most frequently used parts of the language should get dedicated infix syntax.
It is not apparent to me that Union
is one of these. I usually use it less than once for every 100 LOC, and prefer to define a type alias if I end up repeating the same Union{...}
declarations.
I count roughly 1700 uses of Union
(with grep -r base
) in Julia Base code, which may be more than typical. Around 220 of these are Union{}
, which this syntax would not change, around 100 are comments.
I don’t think that introducing special syntax for something typical code uses in approximately every 100th line makes sense.
Thanks for the analysis, @Tamas_Papp! I really like your approach to quantify this sort of thing. While it can’t be a comprehensive way to study the problem (e.g., considering @mkitti’s points about users coming from other languages where |
is used – this might change the scale of a particular LOC frequency), it is a really nice way to frame things in a more objective light.
Do you have on-hand what typical numbers are for other infix operators in Base
? e.g., |(::Integer, ::Integer)
?
I’m also very curious what the difference is with the greater package ecosystem. I can try to look at this later today if you don’t get a chance.
Do you have on-hand what typical numbers are for other infix operators in
Base
?
I did not make a comprehensive study, but eg for +
it is more than 3300 counts using grep
.
But note that typically Julia uses infix for arithmetic, and there is very little arithmetic done per se in Base. I would expect that in Base, Union
is overrepresented and arithmetic is underrepresented, compared to typical Julia packages. Eg in LinearAlgebra,
tamas@tamas:~/src/julia-1.10/stdlib/LinearAlgebra$ grep '+' -r . | wc -l
1755
tamas@tamas:~/src/julia-1.10/stdlib/LinearAlgebra$ grep 'Union' -r . | wc -l
331
FWIW, in my own code it seems to be around one usage of Union
for every 500 LOC. But maybe I am atypical.
But note that typically Julia uses infix for arithmetic
Aren’t all of :
, ::
, ∈
, ==
, ===
, &&
, ||
, ->
, =>
, and even .
infix operators?
Sure they are.
You may want to compile your own statistics based on the package ecosystem, but note that usage frequency is not the only motivation to make an operator infix. Eg short-circuiting &&
and ||
have to be infix because they are not function calls, etc.
Also, you may find that some super-rarely used operators are infix, eg ++
. But that does not imply that Union
should have an infix version, just that the language designers left some room for expansion in the syntax.
Would it be possible to test the usefulness of this syntactic sugar by using it in type reporting tools such as Cthulhu.jl or the VS Code extension?
I don’t think that introducing special syntax for something typical code uses in approximately every 100th line makes sense.
I’m not sure that this is the best metric, since a lot of conventional infix operators (e.g. %
, &
, \
) are used in only a small fraction of Julia code … but when they are used, the code would be much less readable without them. Arithmetic operators are often deeply nested, which favors infix notation. Whereas Union{...}
is almost never nested — it is typically used only at the outermost level of a type expression.
Whereas
Union{...}
is almost never nested
I’m not sure nestedness matters to liking or disliking the syntax (I still like it even at the top-level). I also think “almost never” is maybe a bit of an exaggeration…
We can also just quantify this. If I dump all the code in Base
that matches Union
, here’s the last 50 lines –
(expand)
Base.get_bool_env(name::String, default::Bool)::Union{Bool,Nothing}
code::Union{CodeInfo,Core.MethodInstance,Nothing}
mod::Union{Module,Nothing}
ret = Vector{Union{InterpreterIP,Ptr{Cvoid}}}()
code = bt2[j]::Union{CodeInfo,Core.MethodInstance,Nothing}
mod = njlvalues == 2 ? bt2[j+1]::Union{Module,Nothing} : nothing
A wrapper type used in `Union{Some{T}, Nothing}` to distinguish between the absence
promote_rule(T::Type{Nothing}, S::Type) = Union{S, Nothing}
return Union{R, Nothing}
R <: Union{} && error("cannot convert a value to nothing for assignment")
lcm(a::Union{Integer,Rational}) = gcd(a)
const HWReal = Union{Int8,Int16,Int32,Int64,UInt8,UInt16,UInt32,UInt64,Float32,Float64}
const HWNumber = Union{HWReal, Complex{<:HWReal}, Rational{<:HWReal}}
powermod(x::Integer, p::Integer, m::Union{Int128,UInt128}) = oftype(m, powermod(x, p, big(m)))
function isqrt(x::Union{Int64,UInt64,Int128,UInt128})
falses(dims::NTuple{N, Union{Integer, OneTo}}) where {N} = falses(map(to_dim, dims))
trues(dims::NTuple{N, Union{Integer, OneTo}}) where {N} = trues(map(to_dim, dims))
function unsafe_copyto!(dest::BitArray, doffs::Integer, src::Union{BitArray,Array}, soffs::Integer, n::Integer)
copyto!(dest::BitArray, doffs::Integer, src::Union{BitArray,Array}, soffs::Integer, n::Integer) =
function _copyto_int!(dest::BitArray, doffs::Int, src::Union{BitArray,Array}, soffs::Int, n::Int)
@propagate_inbounds function setindex!(B::BitArray, X::AbstractArray, J0::Union{Colon,AbstractUnitRange{Int}})
function splice!(B::BitVector, r::Union{AbstractUnitRange{Int}, Integer}, ins::AbstractArray = _default_bit_splice)
function splice!(B::BitVector, r::Union{AbstractUnitRange{Int}, Integer}, ins)
($f)(A::Union{BitMatrix,BitVector}, B::Union{BitMatrix,BitVector}) = ($f)(Array(A), Array(B))
(>>)(B::BitVector, i::Union{Int, UInt}) = B >>> i
function findnext(pred::Fix2{<:Union{typeof(isequal),typeof(==)},Bool},
function findprev(pred::Fix2{<:Union{typeof(isequal),typeof(==)},Bool},
map(::Union{typeof(~), typeof(!)}, A::BitArray) = bit_map!(~, similar(A), A)
map!(::Union{typeof(~), typeof(!)}, dest::BitArray, A::BitArray) = bit_map!(~, dest, A)
for (T, f) in ((:(Union{typeof(&), typeof(*), typeof(min)}), :(&)),
(:(Union{typeof(|), typeof(max)}), :(|)),
(:(Union{typeof(xor), typeof(!=)}), :xor),
(:(Union{typeof(>=), typeof(^)}), :((p, q) -> p | ~q)),
function hcat(A::Union{BitMatrix,BitVector}...)
_cat(dims::Integer, X::Union{BitArray, Bool}...) = _cat(Int(dims)::Int, X...)
function _cat(dims::Int, X::Union{BitArray, Bool}...)
function _split_rest(a::Union{Vector, BitVector}, n::Int)
function rationalize(::Type{T}, x::Union{AbstractFloat, Rational}, tol::Real) where T<:Integer
/(x::Rational, y::Union{Rational, Integer, Complex{<:Union{Integer,Rational}}}) = x//y
/(x::Union{Integer, Complex{<:Union{Integer,Rational}}}, y::Rational) = x//y
env::Union{Vector{String},Nothing}
cpus::Union{Nothing,Vector{UInt16}}
cpus::Union{Nothing,Vector{UInt16}} = cmd.cpus,
function show(io::IO, cmds::Union{OrCmds,ErrOrCmds})
setup_stdio(stdio::Union{DevNull,OS_HANDLE,RawFD}, ::Bool) = (stdio, false)
const Redirectable = Union{IO, FileRedirect, RawFD, OS_HANDLE}
ignorestatus(cmd::Union{OrCmds,AndCmds}) =
byteenv(env::Union{AbstractVector{Pair{T,V}}, Tuple{Vararg{Pair{T,V}}}}) where {T<:AbstractString,V} =
pipeline(src::Union{Redirectable,AbstractString}, cmd::AbstractCmd) = pipeline(cmd, stdin=src)
parsed::Tuple{Vararg{Tuple{Vararg{Union{String, SubString{String}}}}}}
So it’s far from “almost never” nested…
Anyways once I find some time I can try to do a frequency analysis and also include nestedness for those interested.
Would it be possible to test the usefulness of this syntactic sugar by using it in type reporting tools such as Cthulhu.jl or the VS Code extension?
I would agree that’s a good use-case as it definitely makes the printouts more concise! Could even happen without this change in Julia itself (as an opt-in configuration option) for users who are comfortable with |
from TypeScript/Python/etc.