# Performance of generator expressions and higher order functions

Hello,

while trying to optimize some Julia code I wrote, I encountered several performance issues all following the same kind of pattern, related to using higher-order functions and generator expressions nested within an outer loop.

Below is a somewhat stupid example, but I think it illustrates my point well enough is a self-contained way. These are three implementations of the same algorithm, in which two loops are involved. All timings below were obtained using `BenchmarkTools` and `julia -O3`.

First variant: both loops are written with `for` loops.

``````function v1(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
for i in 2:n
for j in 2:i-1
if i%j==0
prime[i] = false
break
end
end
end
end
``````
``````julia> @btime v1(1000)
307.409 μs (1 allocation: 1.06 KiB)
``````

Second variant: the inner loop has been replaced by a call to a higher-order function (`any` in this case) and a generator expression. Notice how the run time and number of allocations increase.

``````function v2(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
for i in 2:n
if any(i%j==0 for j in 2:i-1)
prime[i] = false
end
end
end
``````
``````julia> @btime v2(1000)
4.536 ms (22201 allocations: 379.16 KiB)
``````

I can imagine why this would be the case, since maybe temporary objects are needed to instantiate the generators, and the optimizer does not see how to avoid them. What puzzles me most is this third variant, in which both loops are using higher-order functions. Replacing the outer loop by a call to `foreach` gives the performance back!

``````function v3(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
foreach(2:n) do i
if any(i%j==0 for j in 2:i-1)
prime[i] = false
end
end
end
``````
``````julia> @btime v3(1000)
313.988 μs (1 allocation: 1.06 KiB)
``````

Does anybody know what happens in this case? Or am I doing anything wrong? Would there be a way to improve version 2 in such a way that it can use higher-order functions to replace the inner loop, but still not kill the performance when used in a “regular” (i.e. `for` or `while`) outer loop?

1 Like

You are unfortunately hitting one of the compiler performance “bugs” regarding variables that are closed over (https://github.com/JuliaLang/julia/issues/15276). It can be seen using `@code_warntype` that we have the dreaded `Core.Box` as described in that issue. We can fix it by helping julia a bit:

``````function v2(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
for i in 2:n
let i = i
if any(i%j==0 for j in 2:i-1)
prime[i] = false
end
end
end
end
``````
``````julia> @btime v2(1000)
327.984 μs (1 allocation: 1.06 KiB)
``````
7 Likes

Thanks!

So this also explains why v3 works better: the `do` block creates a function taking `i` as argument, which has the same effect as introducing the `let i=i` block. Did I understand things correctly?

BTW, for future reference, following the bug you mentioned led me to the Performance of captured variable paragraph of the Julia manual, in which this problem is explained in more details.

Thanks again!

I don’t get it. I used to believe that the following two versions are lowered identically (I just made all scopes and loops explicit):

``````julia> function v5a(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
rr=2:n
it = iterate(rr)
@label loopstart
(it === nothing) && @goto loopend
let (i,s) = it
if any(i%j==0 for j in 2:i-1)
prime[i]=false
end
it = iterate(rr,s)
@goto loopstart
end
@label loopend
return prime
end

julia> function v5b(n :: Int)
prime = Array{Bool}(undef, n)
fill!(prime, true)
rr=2:n
it = iterate(rr)
@label loopstart
(it === nothing) && @goto loopend
let
(i,s) = it
if any(i%j==0 for j in 2:i-1)
prime[i]=false
end
it = iterate(rr,s)
@goto loopstart
end
@label loopend
return prime
end
``````

However,

``````julia> N=1000; v5a(N);v5b(N);@time v5a(N);@time v5b(N);
0.000459 seconds (5 allocations: 1.219 KiB)
0.007087 seconds (22.20 k allocations: 379.313 KiB)
``````

Imo both variants should be identical after lowering (before inference)?

Or is there any difference in semantics between `let a = b \n body end` and `let \n a = b \n body end`, under the assumption that the identifier `a` is not present in the outer function scope (which is know during lowering!)?

If there is no semantic difference, then there is quite a bit of pre-inference optimization potential.

1 Like

Yeah.

``````julia> Meta.lower(Main, quote
let x = x
nothing
end
end)
:(\$(Expr(:thunk, CodeInfo(

2 1 ─ %1 = x                                                                                                                                                                                                                              │
│        x = %1                                                                                                                                                                                                                         │
3 └──      return nothing                                                                                                                                                                                                                 │
))))
``````

vs.

``````julia> Meta.lower(Main, quote
let
x = x
nothing
end
end)
:(\$(Expr(:thunk, CodeInfo(
3 1 ─     x = x                                                                                                                                                                                                                           │
4 └──     return nothing                                                                                                                                                                                                                  │
))))
``````

This is indeed a very good point. From my understanding of the documentation of `let`, there should be no difference between both forms: if the relevant variable does not exist in the outer function scope, I don’t see what could be done except create a new location.

The only thing which would explain the difference in performance is if the compiler is not aware (yet) of the non-existence of the variable in the outer scope, and can thus not know whether the assignment will modify an existing location or create a new one… At least that is how I understand things.

That being said, this behavior might explain the difference between versions 2 and 3 above, but only if a `for` loop introduces a `let` block around its body. And I’m not sure this is what gets done. On the one hand, the documentation for the Iteration interface states that a for loop is translated into:

``````next = iterate(iter)
while next !== nothing
(i, state) = next
# body
next = iterate(iter, state)
end
``````

without mentioning any `let` block. On the other hand, the Scope of variables paragraph mentions that

`for` loops, `while` loops, and Comprehensions have the following behavior: any new variables introduced in their body scopes are freshly allocated for each loop iteration, as if the loop body were surrounded by a `let` block.

and the example just below that effectively shows that closures taken from the body loop capture different variable locations. So that now I’m confused as to why a `let` block surrounding the whole iteration body changes anything…

So I guess my question becomes: is there an implicit `let` block around a `for` loop body, and if so, what form does it take?

But by assigning `x = x`, your example supposes that there exists an outer binding for the variable `x` (in the right-hand side). Whereas @foobar_lv2’s example supposed that there exist no outer binding, which is exactly why both forms of the `let` block should behave in the same way.

And indeed, there isn’t much difference between

``````julia> Meta.lower(Main, quote
let x = b
nothing
end
end)

:(\$(Expr(:thunk, CodeInfo(
│2 1 ─     x = b
│3 └──     return nothing
))))

``````

and

``````julia> Meta.lower(Main, quote
let
x = b
nothing
end
end)

:(\$(Expr(:thunk, CodeInfo(
│3 1 ─     x = b
│4 └──     return nothing
))))
``````

(but I’m not sure exactly what the numbers on the left mean, or if they are relevant here…)

BTW, thanks for `Meta.lower`: I only knew of `@code_lowered`

``````julia> Meta.lower(Main, quote
let x = b
nothing
end
end)
:(\$(Expr(:thunk, CodeInfo(
2 1 ─     x = b                                                                          │
3 └──     return nothing                                                                 │
))))

julia> Meta.lower(Main, quote
let
x = b
nothing
end
end)
:(\$(Expr(:thunk, CodeInfo(
3 1 ─     x = b                                                                          │
4 └──     return nothing                                                                 │
))))
``````

So the question remains: Are there semantic differences? Or is it possible to do a pure AST optimization of an entire function that removes some closure-boxes by moving all scoped variables into explicit let-clauses?

If the latter, then shouldn’t this be done during lowering (ie as a very early compiler pass), instead of by macro or by hand?