Are there idioms in Julia for fast Algebraic Data Types (ADT)?

Thanks @thautwarm! I’ve heard FP people talk a lot about final tagless, and I’ve explored it a bit. But I still don’t really understand its full potential, or any limitations. Are there situations you would definitely reach for this, or any where you would avoid it?

BTW for a nice interface for your example, you can do

julia> (sym::SYM)(term) = term(sym)

julia> evaluate(add(constant(2), constant(3)))
5
2 Likes

Like @cscherrer, I am still not sure that I can see the full potential of “final tagless ADTs”.

Also, I do not see how it improves on the simple solution from @Mason:

abstract type Expression end
struct Const{T} <: Expression
    value :: T
end
struct Add{L, R} <: Expression
    lhs::L
    rhs::R
end

evaluate(e::Const) = e.value
evaluate(e::Add) = evaluate(e.lhs) + evaluate(e.rhs)

In both case, some specialized code is generated to evaluate a single, specific expression (or, more rigorously, a small family of expressions sharing the exact same tree structure) and so no tags are needed. Dispatch does not happen at runtime but during JIT compilation.

However, I have a hard time finding a lot of situations where this is really what you want. When working with a large number of expressions, you probably do not want to compile one version of the evaluation function per expression to evaluate. And even when working with a small number of expressions, can the savings that result from avoiding dynamic dispatch outweigh the increased cost of JIT compilation?

1 Like

One pain point that I’ve never seen solved in Julia is building trees top-down, for example for a decision tree. You might try something like

abstract type Tree

struct Branch{L,R} <: Tree
    left :: L
    right :: R
end

struct Leaf{T} <: Tree
    value :: T
end

But the L and R values aren’t known until the tree is done. Could final tagless be better for this?

3 Likes

@cscherrer Thanks for the wrapping!

Tagless final can do everything ADTs and GADTs can do, and IIRC there shall be some menchanical methods to transform your code from ADT approach to tagless final approach.

The advantages of ADT(initial approach) and final approach are different:

When you’re using ADTs, things’re totally straightforward because you see concrete data

ADTs(tagged unions) are signals, data and descriptors.

When you want to use them, you’re supposed to write interpreters/evaluators for your ADTs, like writing pattern matching to decide constructor-specific behaviors.

When you’re using tagless final, you manipulate operations extracted from the ADT data, and the data turns out to be unnecessary

Tagless final

  • avoids the use of too many tags(hence, tagless). It catches the observation that, what you’ll finally do(operations) with your ADTs can live without ADTs.

  • encodes the ADTs to post-order visiting functions, which can be composed to make bigger operations, just like how we compose ADTs to construct recursive data.

  • has many other advantages, say, it can be used to achieve pattern matching without metaprogramming libraries like MLStyle.jl or Match.jl, and the implemented pattern matching is naturally first-class.

tagless final is usually more concise because it contains a post-order visiting, which you shall manually implement when you’re using ADTs.

Unfortunately, above code to transform ADTs to tagless final does not directly apply to all Julia code, because Julia is a strict programming language.

Given a term Add(left, right), you might want to visit Add node firstly, and might ignore the visiting of left and right, i.e, you want pre-order visiting.

However, with tagless final, the sub-components in Add(left, right) are always firstly visited, so, to support pre-order visiting or other visiting strategies, things can be a little disturbing.

Hi, I agree with you.

I just want to post some thing here to give your a point of view, that, tags are in fact unnecessary, because we just match tags and then perform operations among them.

The final operations are always what we actually need.

For example, if we just come to your example, tagless final encoding is also concise(my previous reply just shows many underlying structures):

struct HowWeWorkWithExpr{F1, F2}
    constant :: F1
    add :: F2
end

evaluate = 
     let constant(v::Int) = v,
         add(l, r) = l + r
         HowWeWorkWithExpr(constant, add)
     end

It’s equivalent to Mason’s code, because the initial approach is actually almost equivalent to the final approach.

Hence I have to say there’re no improvements.

However, I personally prefer tagless final because in this way

  • we don’t need too many type parameters for each constructors
  • we don’t need abstract types
  • fewer global variables
  • fewer data types

Further, if you try to make a larger ADT and its evaluation, you might find some interesting convenience that tagless final brings about:

you always do not need to write code for picking fields/contents from data, which makes me feel good for my shoulders…

5 Likes

Thanks Chad, this is really a good example!

See this:

struct Tree{C1, C2}
    branch :: C1
    leaf :: C2
end

# polymorphic branch/ `branch` constructor
branch(left, right) =
    function run(mod)
        mod.branch(left(mod), right(mod))
    end

# polymorphic leaf/ `leaf` constructor
leaf(value) =
    function run(mod)
        mod.leaf(value)
    end


# eval to Int
tree_sum_eval = Tree(
    +, # how to evaluate branch
    identity # how to evaluate leaf
)

# eval to a function (indent::String)::String
tree_print = Tree(
    function (left, right)
        function run(indent)
            println(indent, "-")
            left(indent * "  ")
            right(indent * "  ")
        end
    end,
    function (value)
        function run(indent)
            println(indent, value)
        end
    end
)

# a tree

tree =
    branch(
        branch(leaf(1), leaf(2)),
        branch(
            leaf(2),
            branch(leaf(5), leaf(2))
        )
    )

tree(tree_print)("")

println(tree(tree_sum_eval))

run it:

λ julia a.jl
-
  -
    1
    2
  -
    2
    -
      5
      2
12
3 Likes

Thanks for your detailed answer and for your great work on MLStyle.jl. :slight_smile:

1 Like

Thanks @thautwarm, I made a little playground to explore these ideas:

Oh, looks good!
Besides, I think the so-called delimited contitunation can also solve the problem of building trees top-down.
However, the problem of using tagless final and delimited continuation is, they don’t actually build a concrete tree node, before built up the children nodes. Things are just postponed.

2 Likes

Have you tried delimited continuation in Julia? I’ve used continuations but haven’t gotten my head around the delimited version.

I think TaglessTrees and IRTools could be good together for abstract interpretation. Make everything a call, then decide what call should do for a given interpreter

1 Like

No, I haven’t. I’m now learning about it in my courses, it has some amazing use cases. (However I don’t really prefer this extremely complex thing).

Good observations!

Yes, everything becomes a call then everything is hookable, and an interpreter can be simply extended(derived) from an another.

You just inspired me, and I feel this is cool, user-friendly:

Tree{C1, C2} = NamedTuple{(:branch, :leaf), Tuple{C1, C2}}

evaluator :: Tree
new_evaluator = (;evaluator..., leaf = some_extension(evaluator.leaf))

looks good…

2 Likes

I found this very puzzling, too. So I looked into what the compiler is doing.

(I put the “naive solution” code in NaiveSolution module. So that’s why you are seeing NaiveSolution. prefix.)

This is what the optimized function does at the beginning:

julia> @code_typed NaiveSolution.evaluate(NaiveSolution.sum_of_ints(2), Dict{String,Int}())
CodeInfo(
1 ── %1  = Base.getfield(e, :lhs)::Main.NaiveSolution.Expression
│    %2  = Main.NaiveSolution.evaluate::Core.Compiler.Const(Main.NaiveSolution.evaluate, false)
│    %3  = (isa)(%1, Main.NaiveSolution.Const)::Bool
└───       goto #3 if not %3
2 ── %5  = π (%1, Main.NaiveSolution.Const)
│    %6  = Base.getfield(%5, :value)::Int64
└───       goto #12

This is equivalent to

if e.lhs isa Const
    e.lhs.value
end

i.e., the compiler eliminate the dynamic dispatch to evaluate(::Const) and then inline the method body.

The compiler does this for Var:

3 ── %8  = (isa)(%1, Main.NaiveSolution.Var)::Bool
└───       goto #9 if not %8
...

and for Add

9 ── %24 = (isa)(%1, Main.NaiveSolution.Add)::Bool
└───       goto #11 if not %24
...

Interestingly the compiler creates the branch where e.lhs is none of above (and infers that its return type is Int), too:

11 ─ %29 = Main.NaiveSolution.evaluate(%1, env)::Int64
└───       goto #12
...

So, the compiler generates what is equivalent to

lhs = e.lhs
if lhs isa Const
    ...
elseif lhs isa Var
    ...
elseif lhs isa Add
    ...
else
    ...
end

where ... in each if branch inlines the relevant definition of evaluate.

Then there is a piece of similar code for e.rhs.

Interestingly, it looks like the compiler can prove that everything is evaluated to Int. So, it can eliminate the dynamic dispatch for + and infer the return type:

12 ┄ %31 = φ (#2 => %6, #8 => %20, #10 => %27, #11 => %29)::Int64
...
23 ┄ %62 = φ (#13 => %37, #19 => %51, #21 => %58, #22 => %60)::Int64
│    %63 = Base.add_int(%31, %62)::Int64
└───       return %63
) => Int64

I think what this example indicates is that, since the compiler knows the whole type tree and the method table, we don’t need to tell it that the Expression is closed.

Though I’m still puzzled why the “closed ADTs” version is slower than the “naive” version. Looking at the typed IR, it looks like the quality of inference is equivalent. It’d also be nice to know what kind of optimization is missing w.r.t the OCaml example.

1 Like

That is because the construction of AnySignedInteger is more expensive.
If we preallocate things in this way:

open_data = SignedInteger[posvec; negvec]
closed_data = AnySignedInteger[posvec; negvec]
 function test_open()
       return sum(value(x) for x in open_data)
end
 function test_closed()
        return sum(value(x) for x in closed_data)
 end


julia> @btime test_closed()
  142.756 ns (0 allocations: 0 bytes)
0

julia> @btime test_open()
  288.318 ns (0 allocations: 0 bytes)
0

Then closed is faster.

I tried pre-allocating the expression object:

function benchmarkable()
    @benchmarkable evaluate($(sum_of_ints(100)), Dict{String, Int}())
end

but the “naive” solution is still faster for me:

julia> run(NaiveSolution.benchmarkable())
BenchmarkTools.Trial:
  memory estimate:  608 bytes
  allocs estimate:  4
  --------------
  minimum time:     827.000 ns (0.00% GC)
  median time:      941.000 ns (0.00% GC)
  mean time:        1.186 μs (0.00% GC)
  maximum time:     35.646 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> run(ClosedADT.benchmarkable())
BenchmarkTools.Trial:
  memory estimate:  608 bytes
  allocs estimate:  4
  --------------
  minimum time:     1.274 μs (0.00% GC)
  median time:      1.505 μs (0.00% GC)
  mean time:        1.673 μs (0.00% GC)
  maximum time:     36.319 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

And I don’t get why the closed solution can be faster for evaluate. The “naive” solution is already doing the union splitting. Looking at the IR, it’s also true for SignedInteger. I think the difference in speed is rather the layout of the array (this part of Julia internal is optimized to handle Union{T,Missing} etc.), not the user code.

(Also, closed was already faster for SignedInteger/AnySignedInteger case: Are there idioms in Julia for fast Algebraic Data Types (ADT)? - #12 by jonathan-laurent)

Oh yes, sorry for this.

However after preallocation in my way, the difference becomes really minor.

The original code has a gap in terms of performance, in my computer.

julia> @btime test_open()
  1.600 μs (202 allocations: 4.92 KiB)
0

julia> println("Testing closed version")
Testing closed version

julia> @btime test_closed()
  612.139 ns (2 allocations: 2.02 KiB)
0

No worries :slight_smile: I just thought it’s nice that the compiler does union splitting even for abstract types. It means that we don’t need to worry too much about the closedness. I think it is in line with your earlier comment:

2 Likes

And tagless final implementation is 9 times slower than the naive version, even though everything is typed and inlined:

tagless final of addition arithmetic evaluation:
julia> @code_typed evaluate_tf(sum_of_ints_tf(100))
CodeInfo(
1 ─ %1 = Core.getfield(term, :term1)::var"#4#5"{Int64}
│ %2 = Core.getfield(%1, :v)::Int64
│ %3 = Core.getfield(term, :term2)::var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64}
,var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{v
ar"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int
64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7
"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{
Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#
6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#
5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},va
r"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"
#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64}
,var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{v
ar"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int
64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7
"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{
...
var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#4#5"{Int64}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
│ %43 = Core.getfield(%42, :term1)::var"#4#5"{Int64}
│ %44 = Core.getfield(%43, :v)::Int64
│ %45 = Core.getfield(%42, :term2)::var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"
...
{Int64}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
│ %202 = Core.getfield(%201, :term1)::var"#4#5"{Int64}
│ %203 = Core.getfield(%202, :v)::Int64
│ %204 = Core.getfield(%201, :term2)::var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64}
,var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{v
ar"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int
64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7
"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{
Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#
6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#
5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#4#5"{Int64}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
│ %205 = Core.getfield(%204, :term1)::var"#4#5"{Int64}
│ %206 = Core.getfield(%205, :v)::Int64
│ %207 = Core.getfield(%204, :term2)::var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64}
,var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{v
ar"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int
64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7
...
,var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{v
ar"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int
64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7
"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{
Int64},var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#4#5"{Int64}}}}}}}}}}}}}}}}}}}}}}}}
│ %229 = Core.getfield(%228, :term1)::var"#4#5"{Int64}
...
{Int64},var"#6#7"{var"#4#5"{Int64}
,var"#4#5"{Int64}}}}
│ %289 = Core.getfield(%288, :term1)::var"#4#5"{Int64}
│ %290 = Core.getfield(%289, :v)::Int64
│ %291 = Core.getfield(%288, :term2)::var"#6#7"{var"#4#5"{Int64},var"#6#7"{var"#4#5"{Int64},var"#4#5"{Int64}}}
│ %292 = Core.getfield(%291, :term1)::var"#4#5"{Int64}
│ %293 = Core.getfield(%292, :v)::Int64
│ %294 = Core.getfield(%291, :term2)::var"#6#7"{var"#4#5"{Int64},var"#4#5"{Int64}}
│ %295 = Core.getfield(%294, :term1)::var"#4#5"{Int64}
│ %296 = Core.getfield(%295, :v)::Int64
│ %297 = Core.getfield(%294, :term2)::var"#4#5"{Int64}
│ %298 = Core.getfield(%297, :v)::Int64
│ %299 = Base.add_int(%296, %298)::Int64
│ %300 = Base.add_int(%293, %299)::Int64
│ %301 = Base.add_int(%290, %300)::Int64
│ %302 = Base.add_int(%287, %301)::Int64
│ %303 = Base.add_int(%284, %302)::Int64
...
│ %395 = Base.add_int(%8, %394)::Int64
│ %396 = Base.add_int(%5, %395)::Int64
│ %397 = Base.add_int(%2, %396)::Int64
└── return %397
) => Int64

const evaluator_tf =
           let constant(v::Int) = v,
               add(l::Int, r::Int) = l + r
               SYM(constant, add)
           end
function evaluate_tf(term); term(evaluator_tf) end
function sum_of_ints_tf(n)
        if n == 1
            return constant(1)
        else
            return add(constant(n), sum_of_ints(n - 1))
        end
end

julia> @btime evaluate_tf(sum_of_ints_tf(100))
  19.401 μs (198 allocations: 43.25 KiB)
5050

julia> @btime evaluate(sum_of_ints(100), Dict{String, Int}())
  2.289 μs (202 allocations: 5.23 KiB)
5050

when preallocate the data:

# naive approach
julia> @btime evaluate($(sum_of_ints(100)), Dict{String, Int}())
  801.099 ns (4 allocations: 608 bytes)
5050

# tagless final approach
julia> @btime evaluate_tf($(sum_of_ints_tf(100)))
  6.999 ns (0 allocations: 0 bytes)
5050
1 Like

I find the tagless final approach interesting. But I still don’t get how it’s different from Mason’s approach, from the point of view of the compiler. A closure (or rather its type) in Julia introduces as many type parameters as the number of variables that it has to capture:

julia> f(x, y) = z -> x + y + z
f (generic function with 1 method)

julia> typeof(f(1, 1.0))
var"#27#28"{Int64,Float64}

It’s also visible in your @code_typed output, too.

Yes, but the type parameters are automatically introduced, which I mentioned before as an advantage:

Let’s get back to the topic: why is the naive solution faster?
invite someone expert in the internal stuffs to answer it?

I think it’s just inlining heuristics. Avoiding indirection through evaluate(e::Expression, env) like this makes the closed case as fast as the naive case:

evaluate(e::Add, env) = evaluate(e.lhs.ctor, env) + evaluate(e.rhs.ctor, env)

Oh, “non-zero abstraction”

1 Like