Symbolics.jl uses Moshi.jl extensively, and it yields obscure types like SymbolicUtils.BasicSymbolicImpl.var"##Storage#AddMul"{T}. After reading Algebraic Data Type - Intro | Moshi, I still don’t get it. The motivating example is that
is type unstable. Then the solution is to create a type Message type whose field is typed as Union{Quit, Move, Write, ChangeColor}. But I don’t get how that’s so fundamentally different. [Quit(), Move(10,20), ...] can also have an eltype that is Union{Quit, Move, Write, ChangeColor}. Both cases hope-and-pray that the compiler will introduce union splitting appropriately. What’s the benefit of the “algebraic data type”?
(I get the benefit of @match though, but AFAICT it would work just as well if @data defined Message = Union{Quit, Move, Write, ChangeColor})
Wrapping a union into a struct is a standard way to achieve type stability, e.g. in list comprehension. Consider this:
julia> a1 = Union{Int, Float64}[1, 2.0];
julia> a2 = Union{Int, Float64}[1, 2];
julia> typeof([i+1 for i in a1])
Vector{Real} (alias for Array{Real, 1})
julia> typeof([i+1 for i in a2]) # Type of the array has changed!
Vector{Int64} (alias for Array{Int64, 1})
julia> struct MyType # Wrap the union into a struct
data::Union{Int, Float64}
end
julia> Base.:+(a::MyType, b) = MyType(a.data + b)
julia> a1_stable = map(MyType, a1)
2-element Vector{MyType}:
MyType(1)
MyType(2.0)
julia> a2_stable = map(MyType, a2)
2-element Vector{MyType}:
MyType(1)
MyType(2)
julia> typeof([i+1 for i in a1_stable])
Vector{MyType} (alias for Array{MyType, 1})
julia> typeof([i+1 for i in a2_stable]) # Type of the array remains unchanged.
Vector{MyType} (alias for Array{MyType, 1})
For performance, you’re still at the mercy of the compiler. The purpose is formal type stability in the interest of correctness rather than performance.
P.S. some performance benefit is also possible, since the proliferation of types is slightly tamed, as shown by the array types in my list comprehension example.
Wrapping a union into a struct allows you to control function barriers, i.e. where dynamic dispatch happens. This has huge performance implications, but which one is better depends on your specifics.
Consider
const TypeUnion = Union{Int, Float64}
foo(x) = nothing; #imagine real code here
struct Wrapped
member::TypeUnion
end
unwrapped = TypeUnion[1, 1.0]
wrapped = [Wrapped(1), Wrapped(1.0)]
function bar(a)
foo(a[1]) #critical dispatch!
end
If you call bar(unwrapped), then the foo call will be a (potentially slow, depending on union splitting) dynamic dispatch. In exchange, the specialized foo implementation will exactly know the types of its inputs, making it faster.
If you call bar(wrapped), then the foo call will be a fast static dispatch. But in exchange, the foo implementation will have to deal with not knowing the exact type of x.member, potentially making it slower and incurring (potentially slow) dynamic dispatch if it calls anything with x.
So you have to judge / trade-off:
Will foo just pass through its parameters to other callees, without unwrapping them? Wrap the union into a struct. No function barrier.
Will foo do computations on the union-typed arguments, possibly in a loop? Pass the parameter unwrapped, i.e. push the dynamic dispatch up the call-stack. Standard design principle in julia, cf function barrier to enable type stability.
No, that is a major misunderstanding of what the type is. Message is not a type whose field is typed as Union{Quit, Move, Write, ChangeColor}. Message is a type whose fields are a union of the fields used by non-existent types {Quit, Move, Write, ChangeColor}. Those are pseudo-types, or varients. There is only one type: Message. It looks like:
struct Message
varient::Symbol
x::Int
y::Int
message::String
r::Float64
g::Float64
b::Float64
end
Now since the different varients don’t share any fields, this isn’t a great example, but if for example Message also had a ::Float64 then what you’d want to do is combine the two in a field and then getproperty alias them to save memory. But no matter, it’s a type that is a union of fields. So then effectively:
Now when you write functions over an Array{Message}, you may want to dispatch over varients, in which case you do explicit if statements:
function myfunction(m::Message)
if m.varient == :Quit
# Do something
elseif m.varient == :Move
# Do something
...
end
This is what the pattern matching macros help you do:
@match message begin
Message.Quit() => "Quit"
Message.Move(x, y) => "Move to $(x), $(y)"
Message.Write(msg) => "Write: $msg"
Message.ChangeColor(r, g, b) => "Change color to ($r, $g, $b)"
_ => "Unknown"
end
just writing out big if statements.
So then with this in mind:
This relies on compiler optimization effectively doing this under the hood which relies on certain heuristics being hit. Moshi does not rely on those compiler optimizations at all.
Nope, you’re not at the mercy of the compiler because there is no union. Everything is statically typed, no dynamic dispatching. You trade dynamic dispatch for static dispatching in terms of matching, where each time you want to do things you need to write things based on varients.
It’s not slightly tamed, there is a guarantee that there is exactly one type, and Julia has a guarentee to then not dynamic dispatch.
Yes, but algebraic data types are not wrapping a union into a struct, so this analysis is not related to what’s actually going on with Moshi. The calls with Moshi will not have dynamic dispatch in your example, instead just an if statement that is statically handled because it would be a single struct
struct Wrapped
varient::Symbol
member1::Int
member2::Float64
end
and then you’d branch based on the varient. There is no type-instability for the compiler to handle.