RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

julia> macro checkconst(expr)
           map(expr.args) do ex
               ex isa Symbol || return ex
               (
                   Symbol = ex,
                   IsConst = isconst(__module__, ex),
                   Value = isdefined(__module__, ex) ?
                       getproperty(__module__, ex) :
                       Symbol("#undef#")
               )
           end
       end
@checkconst (macro with 1 method)

julia> const A = 4
4

julia> B = 7
7

julia> @checkconst A + B + C
4-element Vector{NamedTuple{(:Symbol, :IsConst, :Value)}}:
 (Symbol = :+, IsConst = true, Value = +)
 (Symbol = :A, IsConst = true, Value = 4)
 (Symbol = :B, IsConst = false, Value = 7)
 (Symbol = :C, IsConst = false, Value = Symbol("#undef#"))

This obviously isn’t what an actual checkconst macro would look like.
But I don’t think it is unreasonable to use isconst in this way.

1 Like

That would produce spectacularly incorrect answers in the presence of local scoping.

julia> const A = 1;

julia> let A = 200000000
           @checkconst (A, )
       end
1-element Vector{NamedTuple{(:Symbol, :IsConst, :Value), Tuple{Symbol, Bool, Int64}}}:
 (Symbol = :A, IsConst = 1, Value = 1)
2 Likes

You’re correct.

2 Likes

Done.

Can you explain what is meant by this?

This was/is something I want to do. But right now there’s no funding to continue working on JuliaSyntax as a job so… yeah it’s unlikely to happen any time soon given how large a project it is to rewrite all of lowering. For now I hope to just land the parser in Base.

4 Likes

[redacted]

That’s too bad. I’m sure you’ve looked into it and have some idea how much effort it would take. How many engineer-months would you estimate (ideally vs. conservatively) it might require? Is it the sort of thing that the Julia community could realistically support, or would it require outside financial support?

2 Likes

Edit: See comment 55 for better draft pseudo-code.

Drafting some rough psuedo-code of the proposal I’ve been imagining for Idea 1, barring any egregious oversight, it appears to be O(n) in the number of expressions in the function body. I’d appreciate a sanity check.

Draft Pseudo-Code of Idea 1:

Define assigns to take an expression and a symbol x. Return true if it or any of its subexpressions (recursive) makes any assignment to x [excluding nested scopes with their own local x]—i.e., check for Expr(:(=), x, ...) and similar assignments, as well special-cased handling of :tuple assignments. Otherwise, return false.

With prior knowledge that a local variable x is captured by an anonymous closure, decide whether to box x by the following procedure (true for box, false for no-box):

  1. Check inside body of closure. If it assigns to x, return true.
  2. Set parent as closure’s parent expression, child as closure expression.
  3. While true: # climb tree, search neighbors for assignments
    a. If parent is the body of a for or while loop, and if x is bound to an outer scope, check subexpressions of parent that preceed child. If any assigns to x, return true.
    b. If parent is a for or while loop, and if x is bound to an outer scope, check if parent’s condition assigns to x. If so, return true.
    c. If parent is the .args[2] of a if or elseif conditional, continue. # next branch is syntactically mutually exclusive.
    d. If parent is a tuple expression, check if it assigns to x. If so, return true.
    e. Check subexpressions of parent that succeed child. If any assigns to x, return true.
    f. If parent is the scope x is bound to, break.
    g. Set child=parent and parent to its own parent and continue loop.
  4. Return false.

The basic gist is: if the captured variable has any assignment expression in or after the closure’s declaration, or before it if in a loop, and discounting branches that are syntactically unreachable from the closure’s declaration, then box it—but otherwise don’t. Allowing for identifiers to be rebound unboxed before the closure is declared (and only boxing if identifiers can be rebound after closure declaration) should cause a good number of boxes to go away (such as this, this, this, this, this, this, this, this, this, this, this, this, and these, not to mention the one from Performance Tips as well as most others that I’ve encountered), and barring any mistakes [and excluding @goto], afaict it should be a strict improvement over current behavior.

Notes:

  • There may be a way to accommodate @goto—initial thoughts suggest it’s still O(n), though it seems likely to require more computation—but I’ll defer those thoughts for later until I gain confidence in this one.
  • Local named functions also present a unique challenge, but on initial thought they too seem doable. I can think of some degenerate behavior, but it coincides with existing degenerate behavior so introduces no new concern.
  • Local recursive named functions deserve consideration.

This seems like a natural next step once incorporating compiler passes is fully designed and implemented (which it seems is already pretty far along from my understanding).

Once that’s in place we could play around with alternative optimizations for things like closures more easily.

Rust seems to have done a really good job of providing its own frontend to LLVM so it’s easy to dig in and improve Rust once you know the language. But they don’t concern themselves with compilation time nearly as much as Julia does, so it’s not clear how hard that would be to implement in a reasonable amount of time for Julia.

1 Like

To make Idea 1 of the OP work with @goto, scrap the above idea of working with the AST and instead work closer to the IR. Seems easier overall (less special cases).

Draft Pseudo-Code of Idea 1 (with goto):

(assumption: all AST has been lowered to flattened IR, but no Core.Boxes have been inserted yet. There might currently be no point during lowering where this is true, but the concept could be adapted.)

With prior knowledge that a local variable x is captured by a closure, decide whether to box x by the following procedure (true for box, false for no-box):

Define assigns(n,x,checked) to take a line number n, a symbol x, and a set of previously explored line numbers checked. assigns operates as follows:

  1. If n ∈ checked, return false.
  2. Push n into checked.
  3. While true:
    a. If expression at line n is a Expr(:(=), x, ...), return true.
    b. If expression at line n is a :method of a closure known to capture x, push n into checked. Loop over every sub-expression in its CodeInfo’s .code. If any is a Expr(:(=), x, ...), return true.
    c. If expression at line n is a Core.GotoNode, return assigns(ex.label,x,checked).
    d. If expression at line n is a Core.GotoIfNot, return assigns(n+1,x,checked)||assigns(ex.label,x,checked).
    e. If expression at line n is a Core.ReturnNode, or if there are no more expressions, return false.
    f. Increment n.

Then, initialize checked to an empty set; let n be the line at which the closure :method is defined; and if assigns(n,x,checked), then box x. If multiple closures are known to capture x, simply call assigns on them without emptying checked to save some computations.

The basic gist is: Explore all expressions that are syntactically reachable in and after the closure’s declaration. If any is an assignment to the captured variable, then box it—but otherwise don’t. Keep a record checked of labels and methods that have been visited, to avoid repeated work and getting stuck in loops. As before:

cc @simeonschaub you seem to have a pretty good grasp of closure capture boxing; do you mind if I entice you to opine?

Hard to estimate because the scope is less clear than you might think: The point of doing this wouldn’t be to just port the existing flisp code to Julia. Because we wouldn’t get much out of that, other than it being a little easier to modify lowering. To really make a difference there’s some rethinking to be done.

Primarily, how do we preserve the detailed source location information provided by JuliaSyntax through macro expansion? This is hard because Expr is the public API for source code in macros (and also the data structure for quoted code). But Expr has no fields for this information. Perhaps there’s a way to add such fields without breaking everything or causing a lot of bloat. But it’s not really clear. Second/related how do we preserve source location through the various lowering passes in a sane way?

It’s at least months several months of effort by someone who’s already a bit familiar with lowering. (shorter for a demo / PoC… but from trying to replace the parser I know there will be a long, long tail of obscure bugs and compatibility things to work out) Also I should point out that I know I’m eternally optimistic and bad at estimating. So to be more realistic I’m just going to say a year’s work :-/

13 Likes

Once lowering is in Julia, it should be easier to attract labor to help work out larger changes; currently there’s a two-language barrier. Also, is enabling the final solution for #15276 not a worthy goal in its own right?

If we restricted the scope of work to just porting julia-syntax.scm to Julia, how much do you figure that would shorten the schedule? I know, porting/copying is far less gratifying than new creation, but let’s hypothesize we’re willing to swallow that bitter pill for the moment. (I suspect JuliaSyntax’s loftier goals—propagating source trivia past macro expansion—might require waiting for a breaking 2.0 release anyway, but in the meantime having better syntax errors and improving type stability would defo be appreciated.)

I’ve seen it stated somewhere, fairly authoritatively, that this can’t really be solved in lowering (though some aspects could still maybe be improved). I don’t pretend to have thought about closure lowering in depth, though, so I don’t have any strong opinions about this myself.

Also for the record, I was only responding to particular queries where I was mentioned in this thread, I haven’t had the time to read the whole thing.

1 Like

Hmm, that’s interesting because it seems to stand at odds with this fairly authoritative comment from @tim.holy:

Because a full solution will require some redesign of the lowering stage, I’m guessing it won’t be tackled until the lowering stage is rewritten in Julia . See the *.scm files in src/ .

so, I’m curious to know where’s the disconnect. Tim’s comment matches my intuition, so my instinct is to believe it, despite Idea 3 of my OP being to work around it (supposing that lowering wouldn’t be ported to Julia). Can you offer a source or reference?

1 Like

I have no idea about the technicalities of this problem but I see no disconnect between changes in lowering being required while not being sufficient.

3 Likes

Can you offer a source or reference?

Sorry, my memory fails me on this (otherwise I’d have offered a reference originally). All I know is that I did get this strong impression at one point, but I could be wrong. I’d need to read both the original issue and this thread and have a think.

1 Like

It seems that first porting the flisp code to Julia without adding any features would be worthwhile to make it more accessible, but it can only be done by someone who already understands the code to some level. The fact that it was written in flisp rather than Julia in the first place makes me think that this is not trivial. On the other hand, with Julia having its roots in LISP maybe there is a translation that is close?
I find even some core Julia code hard to read but I presume that‘s because the concepts behind it are foreign, not because it is not succinct. The flisp part could be the most succinct representation of the concepts but it not being thoroughly documented makes it impossible for a larger community to understand. So part of the translation work would probably involve documenting why the code is the way it is.

1 Like

I think my comment was basically that one needs to be able to do some of the things that type-inference does, and so for as long as lowering is strictly before type-inference there’s no hope of solving it. That doesn’t mean it will be easy (or possible) even if we can go back-and-forth between the two. Others have thought about this much more than I have, though.

7 Likes

I suppose this depends on what we consider “sufficient.” We can obviously contrive measures by which only infinite effort would be considered sufficient, and which would demand we first solve the NP-hard class of problems, but such sufficiency measures are unreasonable.

I consider it sufficient if we eliminate any boxes that we obviously can, and we parameterize boxes with whatever concrete types that we obviously can—where obviousness is measured as a function of language semantics, syntax, and function return types, without attempting to solve undecidable problems, nor demand lowering make decisions that would require access to runtime values, nor other unreasonably difficult, unintuitive, and brittle things. Although great strides have been taken, there’s a lot of meat still on the bone here. (The example in Performance Tips is one such case of a box that’s obviously unnecessary per language semantics.)

By this measure, I would consider the changes that are possible in lowering to be sufficient. As an example of something that such “sufficient” changes in lowering should be able to achieve, this would become type-stable (from knowledge that return_type(Tuple{typeof(+), Int, Int}) == Int and thus storing a in a Core.Box{Int}):

let a=0
    foreach(1:10) do i::Int
        a += i
    end
    a
end

but, remove the ::Int annotation, and it’s reasonable for it to be type-unstable because it can’t be known from syntax and return_type what type i will have and thus what type a will have.

To summarize Idea 1:

  • Stop boxing captures that have no syntactically reachable assignment in/after the closure’s definition, as demonstrated in pseudo-code here. It’s a simple idea which appears to require O(n) time.
  • Because assigning a variable multiple times before capturing it, but not after, is a common pattern, the boxes which would be eliminated by Idea 1 are surprisingly numerous, including: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
  • Even as simple as Idea 1 is conceptually, it is still superior to the current boxing decision rule.

Absent legitimate protest, I’ll continue.

Explaining Idea 2

Again, Idea 2 does not require type inference in lowering; it’s simply making Core.Box to be type-parameterizable, akin to Base.RefValue. Benefit can be immediately realized in the cases where a variable is type-annotated, which again doesn’t require type inference in lowering.

From Performance Tips:

if it is known that a captured variable does not change its type, then this can be declared explicitly with a type annotation (on the variable, not the right-hand side)

The type annotation partially recovers lost performance due to capturing because the parser can associate a concrete type to the object in the box.

Consider the following code, in which the programmer follows this guidance by annotating s as Int to achieve type-stability:

function f(x::Int)
    s::Int = 0
    foreach(1:x) do i; s += 1 end
    return s
end

Inspect the lowered code, and you will see that s is contained in a mutable, type-unstable Core.Box, and Int is asserted whenever s is assigned or accessed. Note that lowering doesn’t need to know what an Int is; it just needs to know that s is being annotated as one. It’s obvious that this same information could be used to type-parameterize the Box, if only Box were type-parameterizable.

To demonstrate the performance improvement to be had, consider two functions: fa, which imitates f using Ref{Any} to mimic the type-unstable Box, and fb, which uses a Ref{Int} to mimic the proposed type-parameterized Box{Int}:

function fa(x::Int)
    s = Ref{Any}(0::Int)
    foreach(1:x) do i; s[] = (s[]::Int + 1)::Int end
    return s[]::Int
end
function fb(x::Int)
    s = Ref{Int}(0)
    foreach(1:x) do i; s[] = s[] + 1 end
    return s[]
end

(Note, fa asserts ::Int whenever s is accessed or assigned, just like f. You can also add ::Int assertions into fb; those don’t matter.)

Let’s check the performance:

julia> using BenchmarkTools
       a=@btime fa($1000)
       b=@btime fb($1000)
       a ≡ b
  4.143 μs (489 allocations: 7.64 KiB)
  1.800 ns (0 allocations: 0 bytes)
true

(Note, you can run @btime f($1000) to confirm that f and fa are similar.)

Clearly, later compiler stages are heavily optimizing fb, and the same optimizations aren’t applied to fa, which has ~1000x worse performance. If we type-parameterized Core.Box as Idea 2 proposes, then f would act like fb instead of fa, and we’d unlock the full performance that type annotation should allow us.

In most cases, the benefit from type-parameterizing Box won’t be as great as in this example, but they’re often still quite substantial. For the example abmult2 in Performance Tips, the benefit is about a 30% timing improvement; other functions I’ve seen improved about 50%. As it stands, we’re leaving a lot of performance on the table.

Notes:

  • This idea was proposed by @rapus95 three years ago, but it didn’t receive satisfactory consideration.
  • Similar performance improvements could be had for type-annotated globals.
  • At some point, when lowering can eventually type-infer, you’d want type-parameterized Boxes anyway—so this idea should be about as controversial as the round earth.
3 Likes