[ANN] MonkeyLang.jl v0.2 -- Now with CLI + Mutable variables + While loop

In the previous post, I announced my implementation of Writing an Interpreter in GO (plus the Lost Chapter (Macros)) and Writing a Compiler in GO: the MonkeyLang.jl package.

Now, I want to announce some advancements I have made in the past few days.

Performance

When I first finished the compiler part, the traditional fib(35) benchmark shows that the VM version ran 3x slower than the evaluator version, which was surprising.

After some invaluable discussion with @jdad and @Chris_Foster, I found out the main cause was the use of abstract Ref{Int}. After refactoring them to mutable struct + Int, the VM version was 3x faster than the evaluator version, which was about the same as GO.

Some minor performance improvements have also been made, which I will not cover here.

The CLI

By using PackageCompiler.jl, MonkeyLang.jl can now be compiled to a standalone CLI executable monkey. Hope you will enjoy it.

New Features

More inspiring are the new features, which are not implemented in the original GO version.

At first, my main goal was to implement a while loop. The lexer + parser part was not so hard. But after that, I realized that to make a while loop useful, I need to implement one more thing: mutable variables.

I refactored the parser to support two different types of LetStatements (maybe I should have renamed it to AssignStatement), one with let (define a new variable), and the other without let (modify an existing variable).

On the evaluator side, things were easy since the scopes were handled elegantly by nested Environments. However, on the compiler side, the current design cannot fully support my requirements. I wanted a while statement to create a new scope, while it could modify the variables in outer scopes (so that the condition might evaluate to false after some loops). But we only had OpSetLocal and OpSetGlobal, which were not sufficient. Implementing OpSetFree on basis of the current OpGetFree was not hard, but that still could not meet my need.

In the final design, I made a while statement whose body is a special Closure and is called repeatedly if the condition remains true. This special Closure is different from normal Closures in that it does not capture free variables in a FreeScope and stores them, instead, it captures free variables in an OuterScope and stores the pointers to them. Here, a pointer (called SymbolPointer in my implementation) has three properties, the nested level, the scope of the original symbol, and the index of the original symbol. I further implemented OpGetOuter and OpSetOuter, which have three operands each. When we need to get or set an outer symbol, we use the level information to go back to the end - level-th frame, and fetch the information there.

And my final achievement can be revealed in the following example:

The Julia version:

x = 5
y = 0

while x > 0
    global x, y
    x = x - 1
    z = 5
    function f()
        w = 1
        while z > 0
            w = w * 2
            y = y + z * w
            z = z - 1
            f()
        end
    end

    f()
end

println(y)

And the Monkey version:

let x = 5;
let y = 0;
while (x > 0) {
    x = x - 1;
    let z = 5;
    let f = fn() {
        let w = 1;
        while (z > 0) {
            w = w * 2;
            y = y + z * w;
            z = z - 1;
            f();
        }
    }
    f();
}
puts(y);

They both output 150. Amazing!

The Monkey version can be run with the CLI with:
./monkey run test.mo --vm (VM version) or ./monkey run test.mo (evaluator version)

6 Likes

With this update, we can now write more efficient fib functions.

For example,

Memoized

let d = {};

let fibonacci = fn(x) {
    if (x == 0) {
        0
    } else {
        if (x == 1) {
            1;
        } else {
            if (type(d[x]) == "NULL") {
                let g = fibonacci(x - 1) + fibonacci(x - 2);
                d = push(d, x, g);
            }

            d[x];
        }
    }
}

fibonacci(35);

Dynamic Programming

let fibonacci = fn(x) {
    let i = 2;
    let a = 0;
    let b = 1;

    while (!(i > x)) {
        let c = a + b;
        a = b;
        b = c;
        i = i + 1;
    }

    b;
}

fibonacci(35);

And here is the new benchmark:

=== Using evaluator ===
  24.350 s (550245996 allocations: 36.68 GiB)
=== Using compiler ===
  8.608 s (371082011 allocations: 7.75 GiB)
=== Using evaluator (memoized) ===
  242.496 μs (4729 allocations: 233.55 KiB)
=== Using compiler (memoized) ===
  237.833 μs (6403 allocations: 303.83 KiB)
=== Using evaluator (dp) ===
  90.001 μs (1932 allocations: 90.67 KiB)
=== Using compiler (dp) ===
  90.756 μs (2353 allocations: 87.08 KiB)

But the final winner is a simple tail recursion, without use of while loops or variable reassignments.

let fibonacci = fn(x, a, b) {
    if (x == 1) {
        b
    } else {
        fibonacci(x - 1, b, a + b)
    }
};

fibonacci(35);

which runs in

=== Using evaluator (tailrec) ===
  39.902 μs (924 allocations: 37.64 KiB)
=== Using compiler (tailrec) ===
  47.634 μs (1049 allocations: 46.17 KiB)
1 Like

Just added something interesting: a Julia transpiler. Now Monkey can use Julia as its backend! @jdad @Chris_Foster

The transpiler transpiles Monkey AST to a Julia Expr, with a bit of code added for compatibility.

The behavior is almost the same as “native” Monkey, except that there is no reassignment validation (symbols can be redefined). Macros are not transpiled to Julia macros, though. They are handled via the original macro expansion functions.

As an example, the fib(35) Monkey code below:

let fibonacci = fn(x) {
    if (x == 0) {
        0
    } else {
        if (x == 1) {
            return 1;
        } else {
            fibonacci(x - 1) + fibonacci(x - 2);
        }
    }
};

fibonacci(35);

transpiles to (unrelated code omitted)

# Preludes...

let
    try
        function fibonacci(x)
            if x == 0
                0
            else
                if x == 1
                    1
                else
                    fibonacci(x - 1) + fibonacci(x - 2)
                end
            end
        end
        fibonacci(35)
    catch e
        # Error handling...
    end
end

However, the code runs more than 10x slower than the same function defined directly, as the benchmark below shows. The second test clearly shows that transpilation itself does not take much time.

=== Julia native ===
  39.196 ms (0 allocations: 0 bytes)
=== Transpile Monkey to Julia ===
  73.750 μs (2003 allocations: 83.73 KiB)
=== Using Julia as the Backend (naive) ===
  481.897 ms (37729 allocations: 920.98 KiB)

I did some further tests and found that this was related to scopes.

Compare the following two fib definitions:

function global_fib(x)
    if x == 0
        0
    else
        if x == 1
            1
        else
            global_fib(x - 1) + global_fib(x - 2)
        end
    end
end

local_fib = let
    function fib(x)
        if x == 0
            0
        else
            if x == 1
                1
            else
                fib(x - 1) + fib(x - 2)
            end
        end
    end
end

The benchmark result is surprising:

=== global_fib() ===
  39.223 ms (0 allocations: 0 bytes)
=== local_fib() ===
  455.592 ms (28656 allocations: 447.75 KiB)

So there must be something related to Julia’s function mechanism I do not know yet. Do you guys have any ideas?

2 Likes

Thanks a lot!

This is very cool! What are the semantics of (re)assignment in Monkey? Can the validation be done in a static analysis pass as part of the translation or does it need to wait until runtime?

The original Monkey does not restrict this, meaning you can do:

let a = 2;
let a = true;
let a = false;

While in my MonkeyLang.jl implementation, it is like:

let a = 2;
a = 3;
a = 4;
let a = 5; # Causes error: a is already defined!

For the evaluator, this check is done during the evaluation; for the bytecode VM, this is done during compilation.

1 Like

Right, in that case you could use the same analysis pass as the bytecode VM, but for the transpiler? If you run the analysis pass on the hierarchical representation before lowering to bytecode you can share the analysis between backends.

1 Like

This is definitely something I would like to try in the future.

It would also help to form a foundation for the addition of a type system.