To summarize Idea 1:
- Stop boxing captures that have no syntactically reachable assignment in/after the closure’s definition, as demonstrated in pseudo-code here. It’s a simple idea which appears to require O(n) time.
- Because assigning a variable multiple times before capturing it, but not after, is a common pattern, the boxes which would be eliminated by Idea 1 are surprisingly numerous, including: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
- Even as simple as Idea 1 is conceptually, it is still superior to the current boxing decision rule.
Absent legitimate protest, I’ll continue.
Explaining Idea 2
Again, Idea 2 does not require type inference in lowering; it’s simply making Core.Box
to be type-parameterizable, akin to Base.RefValue
. Benefit can be immediately realized in the cases where a variable is type-annotated, which again doesn’t require type inference in lowering.
From Performance Tips:
if it is known that a captured variable does not change its type, then this can be declared explicitly with a type annotation (on the variable, not the right-hand side)
…
The type annotation partially recovers lost performance due to capturing because the parser can associate a concrete type to the object in the box.
Consider the following code, in which the programmer follows this guidance by annotating s
as Int
to achieve type-stability:
function f(x::Int)
s::Int = 0
foreach(1:x) do i; s += 1 end
return s
end
Inspect the lowered code, and you will see that s
is contained in a mutable, type-unstable Core.Box
, and Int
is asserted whenever s
is assigned or accessed. Note that lowering doesn’t need to know what an Int
is; it just needs to know that s
is being annotated as one. It’s obvious that this same information could be used to type-parameterize the Box
, if only Box
were type-parameterizable.
To demonstrate the performance improvement to be had, consider two functions: fa
, which imitates f
using Ref{Any}
to mimic the type-unstable Box
, and fb
, which uses a Ref{Int}
to mimic the proposed type-parameterized Box{Int}
:
function fa(x::Int)
s = Ref{Any}(0::Int)
foreach(1:x) do i; s[] = (s[]::Int + 1)::Int end
return s[]::Int
end
function fb(x::Int)
s = Ref{Int}(0)
foreach(1:x) do i; s[] = s[] + 1 end
return s[]
end
(Note, fa
asserts ::Int
whenever s
is accessed or assigned, just like f
. You can also add ::Int
assertions into fb
; those don’t matter.)
Let’s check the performance:
julia> using BenchmarkTools
a=@btime fa($1000)
b=@btime fb($1000)
a ≡ b
4.143 μs (489 allocations: 7.64 KiB)
1.800 ns (0 allocations: 0 bytes)
true
(Note, you can run @btime f($1000)
to confirm that f
and fa
are similar.)
Clearly, later compiler stages are heavily optimizing fb
, and the same optimizations aren’t applied to fa
, which has ~1000x worse performance. If we type-parameterized Core.Box
as Idea 2 proposes, then f
would act like fb
instead of fa
, and we’d unlock the full performance that type annotation should allow us.
In most cases, the benefit from type-parameterizing Box
won’t be as great as in this example, but they’re often still quite substantial. For the example abmult2
in Performance Tips, the benefit is about a 30% timing improvement; other functions I’ve seen improved about 50%. As it stands, we’re leaving a lot of performance on the table.
Notes:
- This idea was proposed by @rapus95 three years ago, but it didn’t receive satisfactory consideration.
- Similar performance improvements could be had for type-annotated globals.
- At some point, when lowering can eventually type-infer, you’d want type-parameterized
Box
es anyway—so this idea should be about as controversial as the round earth.