Understanding `*Bindings` in `JuliaLowering.jl`

Hi everyone! I am trying to understand the building pieces of JuliaLowering.jl. I am very new to compilers so I apologise in advance if my questions have very obvious answers.

I don’t understand the behaviour of Bindings in the contexts created at each lowering step. Starting from this script I have a bunch of questions:

text = """
x::Int = 10
"""

in_mod = Module()
ex = parseall(JuliaLowering.SyntaxTree, text)

ctx1, ex_macroexpand = JuliaLowering.expand_forms_1(in_mod, ex)
ctx2, ex_desugar = JuliaLowering.expand_forms_2(ctx1, ex_macroexpand)
ctx3, ex_scoped = JuliaLowering.resolve_scopes(ctx2, ex_desugar)
ctx4, ex_converted = JuliaLowering.convert_closures(ctx3, ex_scoped)
ctx5, ex_compiled = JuliaLowering.linearize_ir(ctx4, ex_converted)
  1. Should I expect ctx* to have a BindingInfo for x? I thought yes, but there is none. Why is that?

  2. While there’s no binding for x, there is a binding for the return value (I think?), return_tmp: JuliaLowering.BindingInfo(1, "return_tmp", :local, 26, nothing, nothing, 0, false, true, false, true, true, false, false). This only appears in the linearize_ir step, but it seems that the other contexts are modified as well after each pass (ctx1 doesn’t have that binding after running expand_forms_1 but it does have it after running linearize_ir). What is the advantage of not having the contexts independent from each other?

  3. What is the (theoretical) difference between a lambda binding and a closure binding? My understanding is that lambda bindings are bindings for lambda functions parameters (x = 2 is a lambda binding in the context of the function x -> x + 3 being called with the argument 2), while closure bindings are bindings for variables captured by a closure (y = 1 is a closure binding in the context of the function f(x) = x + y with y defined somewhere above and it has the value 1 assigned to it). Is this correct? Does the concept of “binding” refer to a name → value association in a particular context, or does it describe something more complex? (For example, is the closure binding the association from y to 1 in the context of the function f(x) = x + y with y = 1, or is it the entire closure, or something different?)

  4. Could you maybe point me somewhere I can find some explanations as to how the syntax graph changes from pass to pass? It is quite difficult for me to see this from the source code and I also can’t seem to come up with clarifying examples.


For using the package I followed the instructions in the README.

julia> versioninfo()
Julia Version 1.12.0-DEV.631
Commit a4e793ee31* (2024-05-30 19:25 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin24.3.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Hey these are great questions about subtle issues! I’ll try to answer these in multiple replies as I don’t have a big block of time/attention right now.

The purpose of SyntaxGraph is as a backing data structure for the trees which are passed between lowering passes. It’s essentially an optimization which allows us to avoid linked data structures, and provide some fancy features without type instability:

  • Arbitrary named attributes on the trees, which can be accessed in a type stable way
  • Ability for passes to independently add and remove named attributes, thus separating pass internals from the attributes it creates and receives.

The current design does come with some downsides, in that the SyntaxGraph itself is always being mutated and needs to be passed along when creating new trees. We’ll see whether this is the right tradeoff as we go along - I think it should be easy to swap an alternative tree system in if we like :slight_smile:

The graph structure of SyntaxGraph is irrelevant to the functionality of lowering - it’s the SyntaxTree objects which are passed around which matter. If your curious, SyntaxGraph happens to be a forest of trees due to the way that lowering works.

2 Likes

Yes, you should expect the list of bindings to contain a binding for x after the resolve_scopes pass has run. Here’s what I get:

julia> ctx3.bindings.info
10-element Vector{JuliaLowering.BindingInfo}:
 BindingInfo(1, "x", :global, 18, mod=Main.M, type=#₂/Int, n_assigned=1)
 BindingInfo(2, "Int", :global, 19, mod=Main.M)
 BindingInfo(3, "binding_type", :local, 33, is_ssa=true, is_internal=true)
 BindingInfo(4, "tmp", :local, 40, is_always_defined=true, is_internal=true)
 BindingInfo(5, "type_tmp", :local, 42, is_ssa=true, is_internal=true)
 BindingInfo(6, "tmp", :local, 58, is_ssa=true, is_internal=true)
 BindingInfo(7, "tmp", :local, 67, is_ssa=true, is_internal=true)
 BindingInfo(8, "tmp", :local, 71, is_ssa=true, is_internal=true)
 BindingInfo(9, "tmp", :local, 77, is_ssa=true, is_internal=true)
 BindingInfo(10, "tmp", :local, 82, is_ssa=true, is_internal=true)

This binding is only filled in after resolve scopes, because it’s the job of the resolve scopes pass to assign a unique binding ID to every symbolic name. As you can see, we’ve detected that x here is global and belongs to the module Main.M and is of type #₂/Int which is BindingId 2 which is Main.Int.

Various lowering passes create temporary bindings where necessary, including linearize_ir. These are often used when the latter compiler passes would work better with a “slot” rather than an ssa value. They have names, but the BindingId is what’s important and is the primary key in the expression tree data structure - the name is only for debugging.

There is no semantic advantage to having the list of Bindings updated as the passes proceed. Indeed, this is a confusing mutation of the data structure! However, the list of bindings is only ever added to or enhanced with metadata, so it’s somewhat safe and convenient to do this in-place. and certainly comes with a slight performance advantage as we don’t need to copy the list of bindings on each pass.

1 Like

The purpose of a binding ID is to replace ambiguous names with unique integer identifiers, according to the rules of lexical scope.

For example, in

function f(x)
    function g(x)
        x = 10
        print(x)
    end
    print(x)
end

we have two different independent xs which are formal arguments to the nested functions. Assigning to one does not affect the other.

BindingID captures this notion of identity for variable names inside nested lexical scopes and arising from macros. It’s not about values which might be bound to these names.

Conceptually, every binding is defined within some lambda, where we consider top level code to be a lambda with zero arguments (a “thunk”) along with special scope rules. The global information about a Binding goes into BindingInfo. Bindings may then be captured and used by individual lambda forms - such metadata goes into LambdaBindingInfo, which records how the binding was used within the current lambda.

ClosureBindings exists to gather information about all lambdas with the same name together into one generic function definition, so that locally defined functions can have fields for all their captures as might be needed when any of the methods are called. This is a bit confusing sorry - if it doesn’t make sense I can illustrate with an example.

1 Like

Thank you for the very detailed explanations! I have a much clearer understanding now, though still very much incomplete.

Yes, you should expect the list of bindings to contain a binding for x after the resolve_scopes pass has run.

That’s what I thought, but I’m getting nothing. Am I missing something here?

julia> text = """
       x::Int = 10
       """;

julia> ex = parseall(JuliaLowering.SyntaxTree, text);

julia> ctx1, ex_macroexpand = JuliaLowering.expand_forms_1(Main, ex);

julia> ctx2, ex_desugar = JuliaLowering.expand_forms_2(ctx1, ex_macroexpand);

julia> ctx3, ex_scoped = JuliaLowering.resolve_scopes(ctx2, ex_desugar);

julia> ctx3.bindings.info
JuliaLowering.BindingInfo[]

julia> ctx4, ex_converted = JuliaLowering.convert_closures(ctx3, ex_scoped);

julia> ctx4.bindings.info
JuliaLowering.BindingInfo[]

julia> ctx5, ex_compiled = JuliaLowering.linearize_ir(ctx4, ex_converted);

julia> ctx5.bindings.info
1-element Vector{JuliaLowering.BindingInfo}:
 BindingInfo(1, "return_tmp", :local, 26, is_ssa=true, is_internal=true)

This is a bit confusing sorry - if it doesn’t make sense I can illustrate with an example.

An example would be very helpful :sweat_smile:, please!

Ooh, the problem here is that you’ve used parseall, but that makes a form of kind K"toplevel". In general top level forms (from file scope) can’t be lowered as a whole, because latter parts of them might depend on macros which are defined earlier on. It works as expected if you use parsestmt, or if you expand each subexpression of ex individually :slight_smile: