Edit: See comment 55 for better draft pseudo-code.
Drafting some rough psuedo-code of the proposal I’ve been imagining for Idea 1, barring any egregious oversight, it appears to be O(n) in the number of expressions in the function body. I’d appreciate a sanity check.
Draft Pseudo-Code of Idea 1:
Define assigns
to take an expression and a symbol x
. Return true if it or any of its subexpressions (recursive) makes any assignment to x
[excluding nested scopes with their own local x
]—i.e., check for Expr(:(=), x, ...)
and similar assignments, as well special-cased handling of :tuple
assignments. Otherwise, return false.
With prior knowledge that a local variable x
is captured by an anonymous closure, decide whether to box x
by the following procedure (true for box, false for no-box):
- Check inside body of closure. If it
assigns
tox
, return true. - Set
parent
as closure’s parent expression,child
as closure expression. - While true: # climb tree, search neighbors for assignments
a. Ifparent
is the body of afor
orwhile
loop, and ifx
is bound to an outer scope, check subexpressions ofparent
that preceedchild
. If anyassigns
tox
, return true.
b. Ifparent
is afor
orwhile
loop, and ifx
is bound to an outer scope, check ifparent
’s conditionassigns
tox
. If so, return true.
c. Ifparent
is the.args[2]
of aif
orelseif
conditional, continue. # next branch is syntactically mutually exclusive.
d. Ifparent
is atuple
expression, check if itassigns
tox
. If so, return true.
e. Check subexpressions ofparent
that succeedchild
. If anyassigns
tox
, return true.
f. Ifparent
is the scopex
is bound to, break.
g. Setchild=parent
andparent
to its own parent and continue loop. - Return false.
The basic gist is: if the captured variable has any assignment expression in or after the closure’s declaration, or before it if in a loop, and discounting branches that are syntactically unreachable from the closure’s declaration, then box it—but otherwise don’t. Allowing for identifiers to be rebound unboxed before the closure is declared (and only boxing if identifiers can be rebound after closure declaration) should cause a good number of boxes to go away (such as this, this, this, this, this, this, this, this, this, this, this, this, and these, not to mention the one from Performance Tips as well as most others that I’ve encountered), and barring any mistakes [and excluding @goto
], afaict it should be a strict improvement over current behavior.
Notes:
- There may be a way to accommodate
@goto
—initial thoughts suggest it’s still O(n), though it seems likely to require more computation—but I’ll defer those thoughts for later until I gain confidence in this one. - Local named functions also present a unique challenge, but on initial thought they too seem doable. I can think of some degenerate behavior, but it coincides with existing degenerate behavior so introduces no new concern.
- Local recursive named functions deserve consideration.