The main thing that makes this painful in julia is that performance tends to be quite dependent on type inference because our dynamic dispatches are so costly (due to everything being generic functions => huge method table). However, clever use of things like function barriers and @nospecialize
can help a lot.
Concretely, I want the following function to be as fast as possible:
function evaluate_sum(es::Vector{Expression})
s = 0
for e in es
s += evaluate(e)
end
return s
end
In a functional language where Expression
is a closed ADT, the “dispatch” that happens at every call to evaluate
costs almost nothing (just making a switch on an integer tag). I am worried that this may not be true in Julia. I guess I should run concrete benchmarks to see how problematic it is in practice.
I am worried about the cost of dispatch, exactly. Any idea on how to make it as small as possible in my evaluate_sum
example?
How fast are you looking for this to be? It’s already quite fast:
abstract type Expression end
struct Const{T} <: Expression
value :: T
end
struct Add{L, R} <: Expression
lhs::L
rhs::R
end
evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)
function evaluate_sum(exprs::Vector{Expression})
s = 0
for e in exprs
s += evaluate(e)
end
s
end
julia> es = [Const(1)
Add(Const(2), Const(3))
Add(Add(Const(1), Const(3)), Const(4))
Const(40)]
4element Array{Expression,1}:
Const{Int64}(1)
Add{Const{Int64},Const{Int64}}(Const{Int64}(2), Const{Int64}(3))
Add{Add{Const{Int64},Const{Int64}},Const{Int64}}(Add{Const{Int64},Const{Int64}}(Const{Int64}(1), Const{Int64}(3)), Const{Int64}(4))
Const{Int64}(40)
julia> @btime evaluate_sum(es)
127.472 ns (0 allocations: 0 bytes)
52
I’m not really sure what to compare it to though.
Turns out it’s much better to just go with your original idea and have abstract storage:
abstract type Expression end
struct Const <: Expression
value :: Int
end
struct Add <: Expression
lhs :: Expression
rhs :: Expression
end
evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)
function evaluate_sum(exprs::Vector{Expression})
s = 0
for e in exprs
x = let e = e
evaluate(e)
end
s += x
end
s
end
es = [Const(1)
Add(Const(2), Const(3))
Add(Add(Const(1), Const(3)), Const(4))
Const(40)]
julia> @btime evaluate_sum(es);
15.159 ns (0 allocations: 0 bytes)
(note you’ll need to restart julia to run this due to type redefinitions)
I ran the following benchmark to compare the cost of doing dispatch on closed unions with the cost of doing dispatch on abstract types:
using BenchmarkTools
abstract type SignedInteger end
struct Pos <: SignedInteger
abs :: UInt64
end
struct Neg <: SignedInteger
abs :: UInt64
end
const AnySignedInteger = Union{Pos, Neg}
const posvec = [Pos(i) for i in 1:100]
const negvec = [Neg(i) for i in 1:100]
value(x::Pos) = Int64(x.abs)
value(x::Neg) = Int64(x.abs)
function test_open()
return sum(value(x) for x in SignedInteger[posvec; negvec])
end
function test_closed()
return sum(value(x) for x in AnySignedInteger[posvec; negvec])
end
println("Testing open version")
@btime test_open()
println("Testing closed version")
@btime test_closed()
The result:
Testing open version
1.792 μs (202 allocations: 4.92 KiB)
Testing closed version
697.020 ns (2 allocations: 2.02 KiB)
Conclusion: it is about 2.3x faster to do dispatch on a closed union type. I actually expected more of a difference, which makes me think that my naive solution would actually not be prohibitively slow compared to something smarter.
Interesting. This means that abstract storage is not as slow as I would have thought.
Do you have any idea why this is faster than the version where you add type parameters to Add
and Const
?
No, I’m actually a bit perplexed as this code:
abstract type Expression end
struct Const <: Expression
value :: Int
end
struct Add{L, R} <: Expression
lhs :: L
rhs :: R
end
const AddAbstract = Add{Expression, Expression}
evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)
function evaluate_sum(exprs::Vector{Expression})
s = 0
for e in exprs
s += evaluate(e)
end
s
end
es = [Const(1)
AddAbstract(Const(2), Const(3))
AddAbstract(AddAbstract(Const(1), Const(3)), Const(4))
Const(40)]
julia> @btime evaluate_sum($es);
49.006 ns (0 allocations: 0 bytes)
produces identical @code_warntype
as the other version, but is slower. This suggests that perhaps the problem is on the LLVM side.
Update: I found a way to encode closed recursive ADTs in Julia and benchmarked this solution against a naive solution. Unfortunately, it performs worse (4.7μs vs 3.3μs for the naive solution), probably due to having to do more allocations.
If anyone finds a better way, please tell me!
Naive solution
abstract type Expression end
struct Const <: Expression
value :: Int
end
struct Var <: Expression
varname ::String
end
struct Add <: Expression
lhs :: Expression
rhs :: Expression
end
evaluate(e::Const, env) = e.value
evaluate(e::Var, env) = env[e.varname]
evaluate(e::Add, env) = evaluate(e.lhs, env) + evaluate(e.rhs, env)
function sum_of_ints(n)
if n == 1
return Const(1)
else
return Add(Const(n), sum_of_ints(n  1))
end
end
using BenchmarkTools
@btime evaluate(sum_of_ints(100), Dict{String, Int}())
3.228 μs (202 allocations: 5.23 KiB)
Solution that encodes recursive closed ADTs
struct Const
value :: Int
end
struct Var
varname ::String
end
struct Add{E}
lhs :: E
rhs :: E
end
struct Expression
ctor :: Union{Const, Var, Add{Expression}}
end
mkConst(value) = Expression(Const(value))
mkAdd(lhs, rhs) = Expression(Add{Expression}(lhs, rhs))
mkVar(var) = Expression(Var(var))
evaluate(e::Expression, env) = evaluate(e.ctor, env)
evaluate(e::Const, env) = e.value
evaluate(e::Var, env) = env[e.varname]
evaluate(e::Add, env) = evaluate(e.lhs, env) + evaluate(e.rhs, env)
function sum_of_ints(n)
if n == 1
return mkConst(1)
else
return mkAdd(mkConst(n), sum_of_ints(n  1))
end
end
using BenchmarkTools
@btime evaluate(sum_of_ints(100), Dict{String, Int}())
4.681 μs (396 allocations: 8.25 KiB)
Maybe I’m missing something, but isn’t it nice that the simple solution is faster? At any rate, I don’t think the performance difference between your two solutions is particularly large.
@CameronBieganek What this experiment indicates, in my opinion, is that the compiler misses an opportunity to optimize the second version. In a perfect world, the second version should be faster as the compiler would leverage the fact that an expression can be nothing else other than a Const
, a Var
or an Add
, and make dynamic dispatch very fast based on this.
So a more interesting comparison would be to compare the time it takes to evaluate the naive version in Julia with an equivalent program written in a language with native ADTs such as OCaml, Haskell or Rust. I am going to try this now.
So I did the experiment in OCaml, which is about twice as fast as the Julia version (1.61μs vs 3.23μs). This is actually not too bad for Julia and this makes me feel better about using ADTs in Julia.
Benchmark Code
type expr =
 Const of int
 Var of string
 Add of expr * expr
let rec evaluate expr env =
match expr with
 Const v > v
 Var x > List.Assoc.find_exn env ~equal:String.equal x
 Add (lhs, rhs) > evaluate lhs env + evaluate rhs env
let rec sum_of_ints = function
 1 > Const 1
 n > Add (Const n, sum_of_ints (n  1))
let profile n =
let acc = ref 0 in
let t = Caml.Sys.time () in
for i = 1 to n do
acc := !acc + evaluate (sum_of_ints 100) []
done;
let dt = (Caml.Sys.time () . t) /. (Float.of_int n) in
Stdio.printf "Average time: %.3f μs" (dt *. 1e6);
acc
let _ = profile 1000000
Average time: 1.653 μs
This is similar to open types, and static exhaustive checking wouldn’t get affected if your analyzer can walk through the whole program.
I’m sorry that MLStyle didn’t address this performance issue.
Actually I did consider the questions you raised here, and due to the restrictions of Julia I don’t really find out an approach.
Also, there is a technique to alter ADTs, called tagless final.
ADT approach is called initial approach in some context, and tagless final is called the final approach in this scope.
For your code, we can use tagless final, to achieve stably typed Julia program:
struct SYM{F1, F2}
constant :: F1
add :: F2
end
function constant(v)
function (sym::SYM)
sym.constant(v)
end
end
function add(term1, term2)
function (sym::SYM)
sym.add(term1(sym), term2(sym))
end
end
# self algebra
self = SYM(constant, add)
evaluate =
let constant(v::Int) = v,
add(l::Int, r::Int) = l + r
SYM(constant, add)
end
println(add(constant(2), constant(3))(evaluate))
@code_warntype add(constant(2), constant(3))(evaluate)
There’re no red points, try above codes in your Julia shell
5
Variables
#self#::var"#17#18"{var"#15#16"{Int64},var"#15#16"{Int64}}
sym::Core.Compiler.Const(SYM{var"#constant#19",var"#add#20"}(var"#constant#19"(), var"#add#20"()), false)
Body::Int64
1 ─ %1 = Base.getproperty(sym, :add)::Core.Compiler.Const(var"#add#20"(), false)
│ %2 = Core.getfield(#self#, :term1)::var"#15#16"{Int64}
│ %3 = (%2)(sym)::Int64
│ %4 = Core.getfield(#self#, :term2)::var"#15#16"{Int64}
│ %5 = (%4)(sym)::Int64
│ %6 = (%1)(%3, %5)::Int64
└── return %6
Thanks @thautwarm! I’ve heard FP people talk a lot about final tagless, and I’ve explored it a bit. But I still don’t really understand its full potential, or any limitations. Are there situations you would definitely reach for this, or any where you would avoid it?
BTW for a nice interface for your example, you can do
julia> (sym::SYM)(term) = term(sym)
julia> evaluate(add(constant(2), constant(3)))
5
Like @cscherrer, I am still not sure that I can see the full potential of “final tagless ADTs”.
Also, I do not see how it improves on the simple solution from @Mason:
abstract type Expression end
struct Const{T} <: Expression
value :: T
end
struct Add{L, R} <: Expression
lhs::L
rhs::R
end
evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)
In both case, some specialized code is generated to evaluate a single, specific expression (or, more rigorously, a small family of expressions sharing the exact same tree structure) and so no tags are needed. Dispatch does not happen at runtime but during JIT compilation.
However, I have a hard time finding a lot of situations where this is really what you want. When working with a large number of expressions, you probably do not want to compile one version of the evaluation function per expression to evaluate. And even when working with a small number of expressions, can the savings that result from avoiding dynamic dispatch outweigh the increased cost of JIT compilation?
One pain point that I’ve never seen solved in Julia is building trees topdown, for example for a decision tree. You might try something like
abstract type Tree
struct Branch{L,R} <: Tree
left :: L
right :: R
end
struct Leaf{T} <: Tree
value :: T
end
But the L
and R
values aren’t known until the tree is done. Could final tagless be better for this?
@cscherrer Thanks for the wrapping!
Tagless final can do everything ADTs and GADTs can do, and IIRC there shall be some menchanical methods to transform your code from ADT approach to tagless final approach.
The advantages of ADT(initial approach) and final approach are different:
When you’re using ADTs, things’re totally straightforward because you see concrete data
ADTs(tagged unions) are signals, data and descriptors.
When you want to use them, you’re supposed to write interpreters/evaluators for your ADTs, like writing pattern matching to decide constructorspecific behaviors.
When you’re using tagless final, you manipulate operations extracted from the ADT data, and the data turns out to be unnecessary
Tagless final

avoids the use of too many tags(hence, tagless). It catches the observation that, what you’ll finally do(operations) with your ADTs can live without ADTs.

encodes the ADTs to postorder visiting functions, which can be composed to make bigger operations, just like how we compose ADTs to construct recursive data.

has many other advantages, say, it can be used to achieve pattern matching without metaprogramming libraries like MLStyle.jl or Match.jl, and the implemented pattern matching is naturally firstclass.
tagless final is usually more concise because it contains a postorder visiting, which you shall manually implement when you’re using ADTs.
Unfortunately, above code to transform ADTs to tagless final does not directly apply to all Julia code, because Julia is a strict programming language.
Given a term Add(left, right)
, you might want to visit Add
node firstly, and might ignore the visiting of left
and right
, i.e, you want preorder visiting.
However, with tagless final, the subcomponents in Add(left, right)
are always firstly visited, so, to support preorder visiting or other visiting strategies, things can be a little disturbing.
Hi, I agree with you.
I just want to post some thing here to give your a point of view, that, tags are in fact unnecessary, because we just match tags and then perform operations among them.
The final operations are always what we actually need.
For example, if we just come to your example, tagless final encoding is also concise(my previous reply just shows many underlying structures):
struct HowWeWorkWithExpr{F1, F2}
constant :: F1
add :: F2
end
evaluate =
let constant(v::Int) = v,
add(l, r) = l + r
HowWeWorkWithExpr(constant, add)
end
It’s equivalent to Mason’s code, because the initial approach is actually almost equivalent to the final approach.
Hence I have to say there’re no improvements.
However, I personally prefer tagless final because in this way
 we don’t need too many type parameters for each constructors
 we don’t need abstract types
 fewer global variables
 fewer data types
Further, if you try to make a larger ADT and its evaluation, you might find some interesting convenience that tagless final brings about:
you always do not need to write code for picking fields/contents from data, which makes me feel good for my shoulders…
Thanks Chad, this is really a good example!
See this:
struct Tree{C1, C2}
branch :: C1
leaf :: C2
end
# polymorphic branch/ `branch` constructor
branch(left, right) =
function run(mod)
mod.branch(left(mod), right(mod))
end
# polymorphic leaf/ `leaf` constructor
leaf(value) =
function run(mod)
mod.leaf(value)
end
# eval to Int
tree_sum_eval = Tree(
+, # how to evaluate branch
identity # how to evaluate leaf
)
# eval to a function (indent::String)::String
tree_print = Tree(
function (left, right)
function run(indent)
println(indent, "")
left(indent * " ")
right(indent * " ")
end
end,
function (value)
function run(indent)
println(indent, value)
end
end
)
# a tree
tree =
branch(
branch(leaf(1), leaf(2)),
branch(
leaf(2),
branch(leaf(5), leaf(2))
)
)
tree(tree_print)("")
println(tree(tree_sum_eval))
run it:
λ julia a.jl


1
2

2

5
2
12
Thanks for your detailed answer and for your great work on MLStyle.jl.