Naïvely written Julia should be performant. It shouldn’t take memorizing a 35-page manual to reliably beat Matlab. Unless more performance footguns are removed, it’s impossible for a package ecosystem to be performant. For a language whose raison d’etre is performance, it is an existential imperative to do better. (We also need a performance tips cheatsheet that fits on a single page.)
Let’s talk about the infamous performance of captured variables in closures · Issue #15276. This comment decisively settled the debate: the final solution would be to rewrite the lowering stage in Julia, so that lowering could leverage Julia’s type inference system.
However, according to this comment,
There are no plans to port lowering to Julia.
Okay. So we just have to live with performance occasionally getting KO’d, and rely on @code_warntype
and JET.jl to avoid yuge performance hits?
Maybe not. (?) Here are four half-baked ideas that have been rattling around my noggin, and I request your assistance to chime in and help work out the feasibility and details:
Idea 1: Don’t box variables that aren’t reassigned in/after closure def’n
Variables are often boxed over-zealously when they don’t need to be. It’s because the rule is too simple: if a capture is ever syntactically reassigned, or if it’s assigned within a conditional, then it’s boxed. But this misses the point of boxing: to cover the case of a variable that could be reassigned during the closure’s lifetime, so that changes are visible at all scopes that capture it (otherwise you’d just use a normal, unboxed, immutable capture). Most of the time, (re)assignment occurs before the closure definition (which is obviously before its lifetime), and assignment after the closure definition simply doesn’t occur but we box it anyway.
Consider the example from Performance Tips:
function abmult(r::Int)
if r < 0
r = -r
end
f = x -> x * r
return f
end
In this example, r
gets boxed in an untyped mutable Core.Box
so accessing it is type-unstable. (Unpopular opinion: It shouldn’t be so easy to inadvertently trash your performance!) According to the manual, because the parser doesn’t have access to type information (emphasis mine),
the parser does not know that
r
has a fixed type (Int
). nor thatr
does not change value once the inner function is created (so that the box is unneeded).
The parser does know that the box is unneeded, because r
is never syntactically reassigned after f
is declared! (there is never an Expr(:(=), :r, ...)
in any branch henceforth.) There is simply no way for r
’s value to change during f
’s lifetime, so the box is a waste and we’re pissing away clock cycles for nothing. This example from the manual is a perfect case study in why the current boxing policy is too aggressive.
It’s a very common pattern—to assign a variable several times near the beginning of a function during some sort of “initialization,” and then capture it but never change it again—many languages have trained us into this pattern, and sometimes it takes a lot of effort to avoid, so it’s a huge performance trap for newbies (and even regulars) to Julia. If lowering just got a little smarter about this, I reckon it’d eliminate about 80% of non-global Core.Box
es.
Idea 2: Type-Parameterize Core.Box
Of course, we can’t avoid boxes forever. But as demonstrated here, once it has been determined that a variable should be boxed, type-parameterizing the box will allow significant performance gains (beyond simply typeassert
ing each access of an untyped container, which is what currently happens when an identifier is type-annotated).
A type-parameterizable Core.Box
could look something like this (adapted from boot.jl line 378):
mutable struct Box{C}
contents::C
Box{C}(c) where C = new{C}(c)
Box(@nospecialize(c)) = new{Any}(c)
Box{C}() where C = new{C}()
Box() = new{Any}()
end
Notice that the default type is Any
, so it’s backwards-compatible.
Once Box
is type-parameterizable, then for any boxed variable with a type annotation (e.g. a::Int
), the lowering stage can parameterize the box accordingly.
Maybe we can avoid type-annotation more though:
Idea 3: Type-Inferring Boxes
It’ll be a while until lowering is ever re-written in Julia, but maybe we can just have lowering emit Julia code that will do the job? It could work something like this:
On every expression where a local boxed variable is assigned, keep a record of what functions are called, on what symbols and literals. Construct an expression that will calculate the resulting type when the outer function is called, and insert that into the box’s type parameter.
For example, if it has been determined that a
is going to be boxed, then when this assignment is encountered (where x
is an argument or global const, and thus is type-stabilized):
a = x + 1
construct an expression like this:
atype1 = :(Core.Compiler.return_type(Tuple{typeof(+), typeof(x), typeof(1)}))
and when seeing an expression like this
a = a - 1.0
construct an expression like this:
atype2 = :(return_type(Tuple{typeof(-), $atype1, typeof(1.0)}))
Construct an expression like this every time a syntactical assignment is encountered.
(if reassignment occurs inside a closure or loop, ideally make an expression that calls a recursive function—the basic gist is to recurse until the typejoin
ed type converges.)
Assemble those expressions into a single typejoin
call, and insert that into the box’s type parameter when declaring the boxed variable. Something like this for example:
atypes = [atype1, atype2, atype3]
emit(:( a = Box{typejoin($(atypes...))}() ))
So, for example, the following function declaration:
function foo(x)
if x > 0; a = x else a = -x end
() -> (a = a - 1; a)
end
could be lowered to (something resembling):
struct var"#13#14"{A}<:Function
a::Core.Box{A}
end
(f::var"#13#14")() = (f.a.contents = f.a.contents - 1; f.a.contents)
function foo(x)
if x > 0; a = x else a = -x end
a = Box{typejoin(typeof(a), return_type(Tuple{typeof(-), typeof(a), typeof(1)}))}(a)
var"#13#14"(a)
end
(pretend I’ve written a recursive type inferencing function though.)
Notice that boxing is delayed until right before the closure is declared, so as to minimize the likelihood that the box type will be abstract at the moment it’s captured.
This concept dumps a bunch of type inferring code into the function body, but if we’re lucky it should all constant-fold for zero runtime cost.
Idea 4: Avoid Boxing for Closures which Immediately Execute
If a closure is returned or if it’s passed as an argument to another function, then its lifetime is unknown so it’s appropriate to box its captures. But for closures which execute immediately and aren’t stored, there isn’t any use in boxing. There are times when we can know syntactically that this will occur.
A first example of this is comprehensions. Array comprehensions offer a syntactical assurance that the closure will immediately execute and that it will not be stored. As a result, there is simply no need to box its captures—even if they’re modified before and after the comprehension has comprehended.
A second—albeit less common—example is for functions which are introduced just to provide an inference barrier. Occasionally it’s useful to write a function and immediately execute it, simply to have the compiler create a type-stable body of code within a sea of instability, like this example. In these cases, too, it can be determined from the syntax that the need for boxes is obviated since the closure will be executed immediately and only once, if the closure definition is the first argument of a :call
expression.
Inner functions can also be declared and called in a loop, although I suspect that’s even less common.
Thoughts please! Would these ideas work? What would it take? What are the gotchas? Am I making any glaring and egregious errors?
Here are some difficulties I can imagine:
Idea 1:
- If a closure captures a variable that is only declared in a conditional branch that doesn’t execute: currently it throws an
UndefRefError
from attempting to access an uninitialized box, but it’ll now throw anUndefVarError
instead. - It might take some thought to consider all the cases where it can be determined a variable is never reassigned after the closure declaration. For example, in this case the closure is declared in the same branch as the variable it encloses; the logic ought to be able to recognize that the branches are mutually exclusive, and therefore not create a box.
Idea 2:
- This should be straightforward, and having type-annotations dictate box type-parameterizations should be an easy and immediate win.
Idea 3:
- If the closure assigns a value to its capture which depends on an argument without type annotation, then I don’t see a way for its type to be known to the closure’s parent scope; a
Box{Any}
will be needed. - Similar if the closure assigns a value to its capture which depends on an untyped global.
- If the closure assigns a value to its capture which depends on a function, and the function definition changes so that it outputs a new type during the closure’s lifetime, what should happen? Currently this is allowed, because the capture is always boxed and type-unstable. But I think it’s worth breaking; I don’t think accommodating this behavior is worth the type-instability. If changing the return type of a function that influences a closure’s capture types would be permitted during the closure’s lifetime, the user should specify that explicitly with a
Ref{Any}
. - If the closure assigns a value to its capture which depends on its previous value, then the type-inferencing code needs to be recursive.
- If the closure assigns a value to its capture
a
which depends on another captureb
, and then assigns a value tob
that depends ona
, and if there’s another closure which captures the same variables and performs its own assignments to them, then what?? It’s giving me a headache trying to figure out where to start. - It feels a bit like I’m trying to do too much: Julia already has a bunch of parsing and inference machinery. Can we lean on it more here?
Idea 4:
- FastClosures.jl does almost the right thing, although there’s more to say about the proper way to handle closures that make assignments to their captures (it definitely doesn’t do the right thing with operators like
+=
).