In a previous post, I mentioned a small interpreter that I wrote both in direct style and in continuation-passing style. In the first case, Julia very quickly compiles the interpreter methods as they are being used. In the second case, however, this process takes inordinate amounts of time even for trivial inputs and, for simple but non-trivial inputs, it does not even finish because it exhausts the memory (32 GB).
Given the size of the interpreter, it was not practical to show it here so I tried to create a MWE but this was not an easy task because even small changes in the interpreter code dramatically changed the compilation time. After quite a bit of time, I managed to produce a tiny interpreter that already demonstrates the problem, although at a smaller scale. Here it is:
eval_expr(expr, env, cont) =
if expr isa Number || expr isa LineNumberNode
cont(expr)
elseif expr isa Symbol
eval_name(expr, env, cont)
elseif expr isa Expr && expr.head === :if
eval_if(expr, env, cont)
elseif expr isa Expr && expr.head === :block
eval_begin(expr, env, cont)
else
eval_call(expr, env, cont)
end
eval_name(expr, env, cont) = cont(eval(expr))
eval_if(expr, env, cont) =
eval_expr(expr.args[1], env, r ->
r ?
eval_expr(expr.args[2], env, cont) :
eval_expr(expr.args[3], env, cont))
eval_begin(expr, env, cont) =
let exprs = expr.args,
eval_sequentially(i) =
i == length(exprs) ?
eval_expr(exprs[i], env, cont) :
eval_expr(exprs[i], env, r -> eval_sequentially(i + 1))
eval_sequentially(1)
end
eval_call(expr, env, cont) =
eval_expr(expr.args[1], env, func ->
eval_exprs(expr.args[2:end], env, args ->
cont(func(args...))))
eval_exprs(exprs, env, cont) =
length(exprs) == 0 ?
cont([]) :
eval_expr(exprs[1], env, r ->
eval_exprs(exprs[2:end], env, rs -> cont([r, rs...])))
Despite being written in continuation-passing style, even trivial inputs seem to demand too much effort from the Julia compiler. Here are a few examples:
@time eval_expr(Meta.parse("1+2"), false, identity)
@time eval_expr(Meta.parse("1+2*3"), false, identity)
@time eval_expr(Meta.parse("1+2*3/4"), false, identity)
@time eval_expr(Meta.parse("1 > 2 ? 1 : 0"), false, identity)
@time eval_expr(Meta.parse("1+2 < 2+3 ? (3*4; 4*5) : 5*6"), false, identity)
Despite the relatively simple inputs, I get the following times (Julia 1.2.0, Intel I7-6700K, 32 GB RAM):
1.545639 seconds (5.65 M allocations: 364.698 MiB, 4.91% gc time)
2.273712 seconds (8.76 M allocations: 573.199 MiB, 5.33% gc time)
4.096212 seconds (15.80 M allocations: 1.007 GiB, 6.11% gc time)
6.431456 seconds (23.77 M allocations: 1.533 GiB, 6.70% gc time)
8.575267 seconds (31.03 M allocations: 1.976 GiB, 5.30% gc time)
This is getting already painful. However, if we redefine the code and try just the final test, we get:
19.221149 seconds (71.77 M allocations: 4.613 GiB, 6.80% gc time)
This does not look like an acceptable time for interactive experimentation. OTOH, this is just a MWE: in the actual interpreter, it does not even finish the compilation process.
BTW, in my experiments, I noticed that the number of branches in the eval_expr
function seems to be very relevant. If you remove just one, compilation times reduce significantly. Similarly, the number of parameters seems to be relevant: you might have noticed a dummy env
parameter that is not being used. If you remove it, compilation times reduce significantly.
I know that I can start injecting type declaration, promote type stability, etc, etc, in the hope of improving the performance but the problem is that I do not want to do that because the interpreter was designed for pedagogical purposes: I want to show my students a direct-style interpreter, to which I incrementally add stuff, such as different scope regimes, short-circuit evaluation, macros, lazy evaluation, and so on. To this end, I want to focus on the evaluation process and I do not want to bother my students with details that are only relevant for the Julia compiler. Moreover, I want to do this incrementally, by redefining the relevant parts of the interpreter to accomodate changes in the semantics of the interpreted language.
This approach was working beautifully until I got to the point where I needed to convert the interpreter to continuation-passing style and found the performance problems that I mentioned.
Do you have any suggestions for addressing these problems?
Thanks.