Edit: See comment 55 for better draft pseudo-code.
Drafting some rough psuedo-code of the proposal I’ve been imagining for Idea 1, barring any egregious oversight, it appears to be O(n) in the number of expressions in the function body. I’d appreciate a sanity check.
Draft Pseudo-Code of Idea 1:
Define assigns to take an expression and a symbol x. Return true if it or any of its subexpressions (recursive) makes any assignment to x [excluding nested scopes with their own local x]—i.e., check for Expr(:(=), x, ...) and similar assignments, as well special-cased handling of :tuple assignments. Otherwise, return false.
With prior knowledge that a local variable x is captured by an anonymous closure, decide whether to box x by the following procedure (true for box, false for no-box):
- Check inside body of closure. If it
assignstox, return true. - Set
parentas closure’s parent expression,childas closure expression. - While true: # climb tree, search neighbors for assignments
a. Ifparentis the body of afororwhileloop, and ifxis bound to an outer scope, check subexpressions ofparentthat preceedchild. If anyassignstox, return true.
b. Ifparentis afororwhileloop, and ifxis bound to an outer scope, check ifparent’s conditionassignstox. If so, return true.
c. Ifparentis the.args[2]of aiforelseifconditional, continue. # next branch is syntactically mutually exclusive.
d. Ifparentis atupleexpression, check if itassignstox. If so, return true.
e. Check subexpressions ofparentthat succeedchild. If anyassignstox, return true.
f. Ifparentis the scopexis bound to, break.
g. Setchild=parentandparentto its own parent and continue loop. - Return false.
The basic gist is: if the captured variable has any assignment expression in or after the closure’s declaration, or before it if in a loop, and discounting branches that are syntactically unreachable from the closure’s declaration, then box it—but otherwise don’t. Allowing for identifiers to be rebound unboxed before the closure is declared (and only boxing if identifiers can be rebound after closure declaration) should cause a good number of boxes to go away (such as this, this, this, this, this, this, this, this, this, this, this, this, and these, not to mention the one from Performance Tips as well as most others that I’ve encountered), and barring any mistakes [and excluding @goto], afaict it should be a strict improvement over current behavior.
Notes:
- There may be a way to accommodate
@goto—initial thoughts suggest it’s still O(n), though it seems likely to require more computation—but I’ll defer those thoughts for later until I gain confidence in this one. - Local named functions also present a unique challenge, but on initial thought they too seem doable. I can think of some degenerate behavior, but it coincides with existing degenerate behavior so introduces no new concern.
- Local recursive named functions deserve consideration.