RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

uniment · March 17, 2023, 12:30pm

To summarize Idea 1:

Stop boxing captures that have no syntactically reachable assignment in/after the closure’s definition, as demonstrated in pseudo-code here. It’s a simple idea which appears to require O(n) time.
Because assigning a variable multiple times before capturing it, but not after, is a common pattern, the boxes which would be eliminated by Idea 1 are surprisingly numerous, including: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Even as simple as Idea 1 is conceptually, it is still superior to the current boxing decision rule.

Absent legitimate protest, I’ll continue.

Explaining Idea 2

Again, Idea 2 does not require type inference in lowering; it’s simply making Core.Box to be type-parameterizable, akin to Base.RefValue. Benefit can be immediately realized in the cases where a variable is type-annotated, which again doesn’t require type inference in lowering.

From Performance Tips:

if it is known that a captured variable does not change its type, then this can be declared explicitly with a type annotation (on the variable, not the right-hand side)
…
The type annotation partially recovers lost performance due to capturing because the parser can associate a concrete type to the object in the box.

Consider the following code, in which the programmer follows this guidance by annotating s as Int to achieve type-stability:

function f(x::Int)
    s::Int = 0
    foreach(1:x) do i; s += 1 end
    return s
end

Inspect the lowered code, and you will see that s is contained in a mutable, type-unstable Core.Box, and Int is asserted whenever s is assigned or accessed. Note that lowering doesn’t need to know what an Int is; it just needs to know that s is being annotated as one. It’s obvious that this same information could be used to type-parameterize the Box, if only Box were type-parameterizable.

To demonstrate the performance improvement to be had, consider two functions: fa, which imitates f using Ref{Any} to mimic the type-unstable Box, and fb, which uses a Ref{Int} to mimic the proposed type-parameterized Box{Int}:

function fa(x::Int)
    s = Ref{Any}(0::Int)
    foreach(1:x) do i; s[] = (s[]::Int + 1)::Int end
    return s[]::Int
end
function fb(x::Int)
    s = Ref{Int}(0)
    foreach(1:x) do i; s[] = s[] + 1 end
    return s[]
end

(Note, fa asserts ::Int whenever s is accessed or assigned, just like f. You can also add ::Int assertions into fb; those don’t matter.)

Let’s check the performance:

julia> using BenchmarkTools
       a=@btime fa($1000)
       b=@btime fb($1000)
       a ≡ b
  4.143 μs (489 allocations: 7.64 KiB)
  1.800 ns (0 allocations: 0 bytes)
true

(Note, you can run @btime f($1000) to confirm that f and fa are similar.)

Clearly, later compiler stages are heavily optimizing fb, and the same optimizations aren’t applied to fa, which has ~1000x worse performance. If we type-parameterized Core.Box as Idea 2 proposes, then f would act like fb instead of fa, and we’d unlock the full performance that type annotation should allow us.

In most cases, the benefit from type-parameterizing Box won’t be as great as in this example, but they’re often still quite substantial. For the example abmult2 in Performance Tips, the benefit is about a 30% timing improvement; other functions I’ve seen improved about 50%. As it stands, we’re leaving a lot of performance on the table.

Notes:

This idea was proposed by @rapus95 three years ago, but it didn’t receive satisfactory consideration.
Similar performance improvements could be had for type-annotated globals.
At some point, when lowering can eventually type-infer, you’d want type-parameterized Boxes anyway—so this idea should be about as controversial as the round earth.