Coming from functional programming, one of the features I have been missing the most in Julia are Algebraic Data Types. Packages such as MLStyle are doing a pretty good job at adding them into the language but I am more interested in how to emulate them using idiomatic Julia code in this post.
Some Julia users have expressed skepticism about adding ADTs in the language on the ground that in many usecases, pattern matching can be emulated using multiple dispatch. For example, we can define a type for simple expressions as follows:
abstract type Expression end
struct Const <: Expression
value :: Int
end
struct Add <: Expression
lhs :: Expression
rhs :: Expression
end
evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)
evaluate(Add(Const(2), Const(3))) # evaluates to `5`
One difference between this code and traditional ADTs is that Expr
here is not closed: anyone can add a new subtype at any moment. This prevents static exhaustive checks but this is not what worries me here. What I am worried about is performances. Indeed:
- The fields of
Add
have abstract types, which is normally a big performance red flag - Even in the absence of a recursive definition, users want to manipulate objects such as vector of expressions (
Vector{Expr}
) and the same problem happens then.
A tentative fix would be to do something like this
abstract type Expression end
const AnyExpression = Union{Const, Add}
struct Const <: Expression
value :: Int
end
struct Add <: Expression
lhs :: AnyExpression
rhs :: AnyExpression
end
Unfortunately, this code does not compile because AnyExpression
and Add
are mutually recursive. Also, assuming it compiled, I am not sure how much I can trust Julia’s implementation to always do the smart thing with those union types, especially when they get bigger (imagine an expression definition with 10 cases).
So, here are my questions:
- Are you aware of any idiom to define fast ADTs in Julia?
- How much of a performance penalty would you expect when going with the naive solution I sketched in the first listing?
- In the case of non-recursive ADTs, would the trick of defining a union type to gather all cases always lead to efficient code (when working with
Vector{AnyExpr}
for example)? - Do packages such as
MLStyle
address the performance issues I am pointing out? (This may be a good question for @thautwarm)
Edit: I found a way to encode closed recursive ADTs in Julia and benchmarked this solution against a naive solution. Unfortunately, it performs worse (4.7ÎĽs vs 3.3ÎĽs for the naive solution), probably due to having to do more allocations.
Edit 2: I also benchmarked Julia against OCaml on manipulating ADTs and Julia is only 2x slower. This makes me feel better about encoding ADTs in Julia.