What's the point of algebraic data types (Moshi.jl)

Symbolics.jl uses Moshi.jl extensively, and it yields obscure types like SymbolicUtils.BasicSymbolicImpl.var"##Storage#AddMul"{T}. After reading Algebraic Data Type - Intro | Moshi, I still don’t get it. The motivating example is that

messages = [Quit(), Move(10, 20), Write("Hello, World!"), ChangeColor(255, 0, 0)]

is type unstable. Then the solution is to create a type Message type whose field is typed as Union{Quit, Move, Write, ChangeColor}. But I don’t get how that’s so fundamentally different. [Quit(), Move(10,20), ...] can also have an eltype that is Union{Quit, Move, Write, ChangeColor}. Both cases hope-and-pray that the compiler will introduce union splitting appropriately. What’s the benefit of the “algebraic data type”?

(I get the benefit of @match though, but AFAICT it would work just as well if @data defined Message = Union{Quit, Move, Write, ChangeColor})

Wrapping a union into a struct is a standard way to achieve type stability, e.g. in list comprehension. Consider this:

julia> a1 = Union{Int, Float64}[1, 2.0];

julia> a2 = Union{Int, Float64}[1, 2];

julia> typeof([i+1 for i in a1])
Vector{Real} (alias for Array{Real, 1})

julia> typeof([i+1 for i in a2]) # Type of the array has changed!
Vector{Int64} (alias for Array{Int64, 1})

julia> struct MyType # Wrap the union into a struct
           data::Union{Int, Float64}
       end

julia> Base.:+(a::MyType, b) = MyType(a.data + b)

julia> a1_stable = map(MyType, a1)
2-element Vector{MyType}:
 MyType(1)
 MyType(2.0)

julia> a2_stable = map(MyType, a2)
2-element Vector{MyType}:
 MyType(1)
 MyType(2)

julia> typeof([i+1 for i in a1_stable])
Vector{MyType} (alias for Array{MyType, 1})

julia> typeof([i+1 for i in a2_stable])  # Type of the array remains unchanged.
Vector{MyType} (alias for Array{MyType, 1})

For performance, you’re still at the mercy of the compiler. The purpose is formal type stability in the interest of correctness rather than performance.

P.S. some performance benefit is also possible, since the proliferation of types is slightly tamed, as shown by the array types in my list comprehension example.

Wrapping a union into a struct allows you to control function barriers, i.e. where dynamic dispatch happens. This has huge performance implications, but which one is better depends on your specifics.

Consider

const TypeUnion = Union{Int, Float64}
foo(x) = nothing; #imagine real code here
struct Wrapped
    member::TypeUnion
end
unwrapped = TypeUnion[1, 1.0]
wrapped = [Wrapped(1), Wrapped(1.0)]
function bar(a)
    foo(a[1]) #critical dispatch!
end

If you call bar(unwrapped), then the foo call will be a (potentially slow, depending on union splitting) dynamic dispatch. In exchange, the specialized foo implementation will exactly know the types of its inputs, making it faster.

If you call bar(wrapped), then the foo call will be a fast static dispatch. But in exchange, the foo implementation will have to deal with not knowing the exact type of x.member, potentially making it slower and incurring (potentially slow) dynamic dispatch if it calls anything with x.

So you have to judge / trade-off:

Will foo just pass through its parameters to other callees, without unwrapping them? Wrap the union into a struct. No function barrier.

Will foo do computations on the union-typed arguments, possibly in a loop? Pass the parameter unwrapped, i.e. push the dynamic dispatch up the call-stack. Standard design principle in julia, cf function barrier to enable type stability.

1 Like

This thread needs corrections… :sweat_smile:

No, that is a major misunderstanding of what the type is. Message is not a type whose field is typed as Union{Quit, Move, Write, ChangeColor}. Message is a type whose fields are a union of the fields used by non-existent types {Quit, Move, Write, ChangeColor}. Those are pseudo-types, or varients. There is only one type: Message. It looks like:

struct Message
  varient::Symbol
  x::Int
  y::Int
  message::String
  r::Float64
  g::Float64
  b::Float64
end

Now since the different varients don’t share any fields, this isn’t a great example, but if for example Message also had a ::Float64 then what you’d want to do is combine the two in a field and then getproperty alias them to save memory. But no matter, it’s a type that is a union of fields. So then effectively:

Move(x::Int, y::Int) = Message(:Move, x,y, undef, undef, undef, undef)

Move is just an alias to a type of Message. All of them are just messages. So

messages = [Quit(), Move(10, 20), Write("Hello, World!"), ChangeColor(255, 0, 0)]

is type-stable, it’s just Array{Message}.

Now when you write functions over an Array{Message}, you may want to dispatch over varients, in which case you do explicit if statements:

function myfunction(m::Message)
   if m.varient == :Quit
       # Do something
   elseif m.varient == :Move
      # Do something
    ...
end

This is what the pattern matching macros help you do:

@match message begin
    Message.Quit() => "Quit"
    Message.Move(x, y) => "Move to $(x), $(y)"
    Message.Write(msg) => "Write: $msg"
    Message.ChangeColor(r, g, b) => "Change color to ($r, $g, $b)"
    _ => "Unknown"
end

just writing out big if statements.

So then with this in mind:

This relies on compiler optimization effectively doing this under the hood which relies on certain heuristics being hit. Moshi does not rely on those compiler optimizations at all.

Nope, you’re not at the mercy of the compiler because there is no union. Everything is statically typed, no dynamic dispatching. You trade dynamic dispatch for static dispatching in terms of matching, where each time you want to do things you need to write things based on varients.

It’s not slightly tamed, there is a guarantee that there is exactly one type, and Julia has a guarentee to then not dynamic dispatch.

Yes, but algebraic data types are not wrapping a union into a struct, so this analysis is not related to what’s actually going on with Moshi. The calls with Moshi will not have dynamic dispatch in your example, instead just an if statement that is statically handled because it would be a single struct

struct Wrapped
   varient::Symbol
   member1::Int
   member2::Float64
end

and then you’d branch based on the varient. There is no type-instability for the compiler to handle.

1 Like

I don’t think that’s true; if you do a @macroexpand you can see it is indeed a wrapped union:

  begin
      struct var"##Storage#Quit"
      end
      struct var"##Storage#Move"
          #= REPL[8]:4 =#
          x::Int64
          #= REPL[8]:5 =#
          y::Int64
      end
      struct var"##Storage#Write"
          #= REPL[8]:8 =#
          var"##field#281"::String
      end
      struct var"##Storage#ChangeColor"
          #= REPL[8]:9 =#
          var"##field#282"::Float64
          #= REPL[8]:9 =#
          var"##field#283"::Float64
          #= REPL[8]:9 =#
          var"##field#284"::Float64
      end
  end
  begin
      #= /Users/eph/.julia/packages/Moshi/UlzGA/src/data/emit/type.jl:23 =#
      struct var"typeof(Message)"
          data::Union{var"##Storage#Quit", var"##Storage#Move", var"##Storage#Write", var"##Storage#ChangeColor"}
      end
      #= /Users/eph/.julia/packages/Moshi/UlzGA/src/data/emit/type.jl:24 =#
      const Type = var"typeof(Message)"
  end

This is also the approach of GitHub - ameligrana/WrappedUnions.jl: Wrap a Union for enhanced type-stability. (I think it was true of Unityper, maybe that’s what you’re remembering?)

1 Like