Symbolics.jl uses Moshi.jl extensively, and it yields obscure types like SymbolicUtils.BasicSymbolicImpl.var"##Storage#AddMul"{T}. After reading Algebraic Data Type - Intro | Moshi, I still don’t get it. The motivating example is that
is type unstable. Then the solution is to create a type Message type whose field is typed as Union{Quit, Move, Write, ChangeColor}. But I don’t get how that’s so fundamentally different. [Quit(), Move(10,20), ...] can also have an eltype that is Union{Quit, Move, Write, ChangeColor}. Both cases hope-and-pray that the compiler will introduce union splitting appropriately. What’s the benefit of the “algebraic data type”?
(I get the benefit of @match though, but AFAICT it would work just as well if @data defined Message = Union{Quit, Move, Write, ChangeColor})
Wrapping a union into a struct is a standard way to achieve type stability, e.g. in list comprehension. Consider this:
julia> a1 = Union{Int, Float64}[1, 2.0];
julia> a2 = Union{Int, Float64}[1, 2];
julia> typeof([i+1 for i in a1])
Vector{Real} (alias for Array{Real, 1})
julia> typeof([i+1 for i in a2]) # Type of the array has changed!
Vector{Int64} (alias for Array{Int64, 1})
julia> struct MyType # Wrap the union into a struct
data::Union{Int, Float64}
end
julia> Base.:+(a::MyType, b) = MyType(a.data + b)
julia> a1_stable = map(MyType, a1)
2-element Vector{MyType}:
MyType(1)
MyType(2.0)
julia> a2_stable = map(MyType, a2)
2-element Vector{MyType}:
MyType(1)
MyType(2)
julia> typeof([i+1 for i in a1_stable])
Vector{MyType} (alias for Array{MyType, 1})
julia> typeof([i+1 for i in a2_stable]) # Type of the array remains unchanged.
Vector{MyType} (alias for Array{MyType, 1})
For performance, you’re still at the mercy of the compiler. The purpose is formal type stability in the interest of correctness rather than performance.
P.S. some performance benefit is also possible, since the proliferation of types is slightly tamed, as shown by the array types in my list comprehension example.
Wrapping a union into a struct allows you to control function barriers, i.e. where dynamic dispatch happens. This has huge performance implications, but which one is better depends on your specifics.
Consider
const TypeUnion = Union{Int, Float64}
foo(x) = nothing; #imagine real code here
struct Wrapped
member::TypeUnion
end
unwrapped = TypeUnion[1, 1.0]
wrapped = [Wrapped(1), Wrapped(1.0)]
function bar(a)
foo(a[1]) #critical dispatch!
end
If you call bar(unwrapped), then the foo call will be a (potentially slow, depending on union splitting) dynamic dispatch. In exchange, the specialized foo implementation will exactly know the types of its inputs, making it faster.
If you call bar(wrapped), then the foo call will be a fast static dispatch. But in exchange, the foo implementation will have to deal with not knowing the exact type of x.member, potentially making it slower and incurring (potentially slow) dynamic dispatch if it calls anything with x.
So you have to judge / trade-off:
Will foo just pass through its parameters to other callees, without unwrapping them? Wrap the union into a struct. No function barrier.
Will foo do computations on the union-typed arguments, possibly in a loop? Pass the parameter unwrapped, i.e. push the dynamic dispatch up the call-stack. Standard design principle in julia, cf function barrier to enable type stability.
Correction: my thread needs corrections I described Unityper. While Julia’s Union optimizations do work similarly at the LLVM level, Moshi.jl does work differently from what I describe except for the pattern matching part. It relies on the Julia union splitting optimizations.
No, that is a major misunderstanding of what the type is. Message is not a type whose field is typed as Union{Quit, Move, Write, ChangeColor}. Message is a type whose fields are a union of the fields used by non-existent types {Quit, Move, Write, ChangeColor}. Those are pseudo-types, or varients. There is only one type: Message. It looks like:
struct Message
varient::Symbol
x::Int
y::Int
message::String
r::Float64
g::Float64
b::Float64
end
Now since the different varients don’t share any fields, this isn’t a great example, but if for example Message also had a ::Float64 then what you’d want to do is combine the two in a field and then getproperty alias them to save memory. But no matter, it’s a type that is a union of fields. So then effectively:
Now when you write functions over an Array{Message}, you may want to dispatch over varients, in which case you do explicit if statements:
function myfunction(m::Message)
if m.varient == :Quit
# Do something
elseif m.varient == :Move
# Do something
...
end
This is what the pattern matching macros help you do:
@match message begin
Message.Quit() => "Quit"
Message.Move(x, y) => "Move to $(x), $(y)"
Message.Write(msg) => "Write: $msg"
Message.ChangeColor(r, g, b) => "Change color to ($r, $g, $b)"
_ => "Unknown"
end
just writing out big if statements.
So then with this in mind:
This relies on compiler optimization effectively doing this under the hood which relies on certain heuristics being hit. Moshi does not rely on those compiler optimizations at all.
Nope, you’re not at the mercy of the compiler because there is no union. Everything is statically typed, no dynamic dispatching. You trade dynamic dispatch for static dispatching in terms of matching, where each time you want to do things you need to write things based on varients.
It’s not slightly tamed, there is a guarantee that there is exactly one type, and Julia has a guarentee to then not dynamic dispatch.
Yes, but algebraic data types are not wrapping a union into a struct, so this analysis is not related to what’s actually going on with Moshi. The calls with Moshi will not have dynamic dispatch in your example, instead just an if statement that is statically handled because it would be a single struct
struct Wrapped
varient::Symbol
member1::Int
member2::Float64
end
and then you’d branch based on the varient. There is no type-instability for the compiler to handle.
This is a misunderstanding of what Moshi is. It absolutely does use unions internally. You appear to be thinking of Unityper.jl which Symbolics used to use, and it did something similar to what you are describing.
Edit: Oops: @ericphanson beat me to it, sorry for the noise.
Yeah, if you cannot guarantee that the types going into the union are mutually disjoint, then you need to wrap them again to force the issue (and you need to keep the wrapper package-private), so generically ADT sums in julia must be wrapped-union-of-wrappers, to property deal with the likes of Some(Some(Some(nothing))) in a Maybe.
Alas, the extra wrapper can introduce extra overhead, both in julia runtime and in notation. Mathematicians abuse notation all the time with \cup / union when they actually mean \dot\cup / \coprod / disjoint union / coproduct, as long as the sets could be proven disjoint by a sufficiently trivial argument.
Opposed to some pure functional programming people, I think this pragmatism (aka sloppiness) is a good thing.
…going the other direction, in terms of language support, is hard! I have yet to see the ADT functional programming people who are so proud of their sum types implement humble union types as pushout / fibred coproduct.
I read your first post and thought “wow, I really didn’t understand anything when reading Moshi’s docs, this post should go into it because it’s quite comprehensive”…
… but now I think that it’s beginning should be edited to mentions that this doesn’t describe Moshi (but Unityper apparently)
How is it more correct than typing with ::Message where Message = Union{Quit, Move, Write, ChangeColor}?
That’s an interesting point. Is it any different performance-wise from using @nospecialize? Incidentally, is there any package that implements @match without the wrapping?
The difference between a Moshi sum type and a Julian union type is that the former wraps the latter. You get the same effect with a normal struct with a union inside. I.e, this is the same as asking what the difference is between U and X below:
const U = Union{A, B, C, D, E}
struct X
x::U
end
Let’s look at some differences:
Suppose you have a function f()::U. Now, what is the type of [f()]? Is it Vector{U}? Nope! It’s going to be a vector of one of the subtypes. However, for f()::X, [g()] is indeed a Vector{X}. This is part of a more general pattern where any operation on an instance of U which uses the type of that instance will not use U, but rather the more specific, concrete type. If the concrete type is only known at runtime (and it usually is, for the use cases of sum types), then you have type instability.
When the compiler sees an X, you know that whoever wrote the code explicitly designed X to have the five variants A to E. However, when the compilser sees a U, this may have been generated by a process where there is general uncertainty about the type produced. Applying union-splitting to the X is ‘safer’, in that you can be reasonably sure that whatever process produced an X is probably only going to concern itself with the five subtypes, and so there will not be a combinatorial explosion of types. In contrast, when dealing with a U, precisely because of the “type propagation” of point 1, a type explosion is more likely. For this reason, Julia is more likely to apply union-splitting to a union when it’s wrapped in a struct.
Suppose you have a process that returns either a T or nothing. If you return a Union{T, Nothing}, then the caller may simply proceed assuming it was a T, e.g. passing it to some function that demands a T. That is very convenient, but it can make the user forget that a nothing was possible. Using a sum type forces the consumer to remember all the possibilities, because you can’t pass an X to a function that requires an A.
Both for ease of use, and in the hopes that it would print nicely (similar to how Vector{Int} prints instead of Array{Int, 1}). For reasons I don’t yet know, it refuses to print the type-alias and insists on printing the generated type.
Moshi.jl defines a module for the ADT, and the type of the struct it creates is available as ADT.Type. See the tip at the bottom of the “Quick Example” section in Syntax & Examples | Moshi.
Vector{T} doesn’t have a custom show method, and it prints as the alias. As far as I’m aware, implementing Base.show(io::IO, ::Type{MyType}) is type-piracy and not recommended.