RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

Can you offer a source or reference?

Sorry, my memory fails me on this (otherwise I’d have offered a reference originally). All I know is that I did get this strong impression at one point, but I could be wrong. I’d need to read both the original issue and this thread and have a think.

1 Like

It seems that first porting the flisp code to Julia without adding any features would be worthwhile to make it more accessible, but it can only be done by someone who already understands the code to some level. The fact that it was written in flisp rather than Julia in the first place makes me think that this is not trivial. On the other hand, with Julia having its roots in LISP maybe there is a translation that is close?
I find even some core Julia code hard to read but I presume that‘s because the concepts behind it are foreign, not because it is not succinct. The flisp part could be the most succinct representation of the concepts but it not being thoroughly documented makes it impossible for a larger community to understand. So part of the translation work would probably involve documenting why the code is the way it is.

1 Like

I think my comment was basically that one needs to be able to do some of the things that type-inference does, and so for as long as lowering is strictly before type-inference there’s no hope of solving it. That doesn’t mean it will be easy (or possible) even if we can go back-and-forth between the two. Others have thought about this much more than I have, though.


I suppose this depends on what we consider “sufficient.” We can obviously contrive measures by which only infinite effort would be considered sufficient, and which would demand we first solve the NP-hard class of problems, but such sufficiency measures are unreasonable.

I consider it sufficient if we eliminate any boxes that we obviously can, and we parameterize boxes with whatever concrete types that we obviously can—where obviousness is measured as a function of language semantics, syntax, and function return types, without attempting to solve undecidable problems, nor demand lowering make decisions that would require access to runtime values, nor other unreasonably difficult, unintuitive, and brittle things. Although great strides have been taken, there’s a lot of meat still on the bone here. (The example in Performance Tips is one such case of a box that’s obviously unnecessary per language semantics.)

By this measure, I would consider the changes that are possible in lowering to be sufficient. As an example of something that such “sufficient” changes in lowering should be able to achieve, this would become type-stable (from knowledge that return_type(Tuple{typeof(+), Int, Int}) == Int and thus storing a in a Core.Box{Int}):

let a=0
    foreach(1:10) do i::Int
        a += i

but, remove the ::Int annotation, and it’s reasonable for it to be type-unstable because it can’t be known from syntax and return_type what type i will have and thus what type a will have.

To summarize Idea 1:

  • Stop boxing captures that have no syntactically reachable assignment in/after the closure’s definition, as demonstrated in pseudo-code here. It’s a simple idea which appears to require O(n) time.
  • Because assigning a variable multiple times before capturing it, but not after, is a common pattern, the boxes which would be eliminated by Idea 1 are surprisingly numerous, including: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
  • Even as simple as Idea 1 is conceptually, it is still superior to the current boxing decision rule.

Absent legitimate protest, I’ll continue.

Explaining Idea 2

Again, Idea 2 does not require type inference in lowering; it’s simply making Core.Box to be type-parameterizable, akin to Base.RefValue. Benefit can be immediately realized in the cases where a variable is type-annotated, which again doesn’t require type inference in lowering.

From Performance Tips:

if it is known that a captured variable does not change its type, then this can be declared explicitly with a type annotation (on the variable, not the right-hand side)

The type annotation partially recovers lost performance due to capturing because the parser can associate a concrete type to the object in the box.

Consider the following code, in which the programmer follows this guidance by annotating s as Int to achieve type-stability:

function f(x::Int)
    s::Int = 0
    foreach(1:x) do i; s += 1 end
    return s

Inspect the lowered code, and you will see that s is contained in a mutable, type-unstable Core.Box, and Int is asserted whenever s is assigned or accessed. Note that lowering doesn’t need to know what an Int is; it just needs to know that s is being annotated as one. It’s obvious that this same information could be used to type-parameterize the Box, if only Box were type-parameterizable.

To demonstrate the performance improvement to be had, consider two functions: fa, which imitates f using Ref{Any} to mimic the type-unstable Box, and fb, which uses a Ref{Int} to mimic the proposed type-parameterized Box{Int}:

function fa(x::Int)
    s = Ref{Any}(0::Int)
    foreach(1:x) do i; s[] = (s[]::Int + 1)::Int end
    return s[]::Int
function fb(x::Int)
    s = Ref{Int}(0)
    foreach(1:x) do i; s[] = s[] + 1 end
    return s[]

(Note, fa asserts ::Int whenever s is accessed or assigned, just like f. You can also add ::Int assertions into fb; those don’t matter.)

Let’s check the performance:

julia> using BenchmarkTools
       a=@btime fa($1000)
       b=@btime fb($1000)
       a ≡ b
  4.143 μs (489 allocations: 7.64 KiB)
  1.800 ns (0 allocations: 0 bytes)

(Note, you can run @btime f($1000) to confirm that f and fa are similar.)

Clearly, later compiler stages are heavily optimizing fb, and the same optimizations aren’t applied to fa, which has ~1000x worse performance. If we type-parameterized Core.Box as Idea 2 proposes, then f would act like fb instead of fa, and we’d unlock the full performance that type annotation should allow us.

In most cases, the benefit from type-parameterizing Box won’t be as great as in this example, but they’re often still quite substantial. For the example abmult2 in Performance Tips, the benefit is about a 30% timing improvement; other functions I’ve seen improved about 50%. As it stands, we’re leaving a lot of performance on the table.


  • This idea was proposed by @rapus95 three years ago, but it didn’t receive satisfactory consideration.
  • Similar performance improvements could be had for type-annotated globals.
  • At some point, when lowering can eventually type-infer, you’d want type-parameterized Boxes anyway—so this idea should be about as controversial as the round earth.