New iteration protocol much slower

matthieu · July 4, 2018, 8:23pm

I want to convert an iteration protocol from 0.6 to 0.7 but I obtain a huge slowdown
On julia 0.6:

function f(iter)
   state = start(iter)
   i = 0
   while !(done(iter, state)) & (i < 1_000_000)
      i += 1
      v, state = next(iter, state)
   end
end
@time f(zeros(1_000_000))
#   0.008861 seconds (2.02 k allocations: 7.734 MiB)

On Julia 0.7

function f(iter)
   x = iterate(iter)
   i = 0
   while (x !== nothing) & (i < 1_000_000)
     i += 1
      v, state = x
      x = iterate(iter, state)
   end
end
@time f(zeros(1_000_000))
#   0.239210 seconds (3.00 M allocations: 68.649 MiB, 1.95% gc time)

I don’t understand why the i < 1_000_000 slows down performances that much.
As a user, I have to say that I am not a fan of this new iteration protocol. The previous iteration protocol was clearer and did not have these performance traps.

bkamins · July 4, 2018, 9:13pm

The MWE is the following I would say:

function g(iter)
   x = iterate(iter)
   while (x !== nothing) && true
       v, state = x
       x = iterate(iter, state)
   end
end
@btime g(zeros(1_000_000))

and it has the same performance penalty, while already this is fast:

function h(iter)
   x = iterate(iter)
   while true && (x !== nothing)
       v, state = x
       x = iterate(iter, state)
   end
end
@btime h(zeros(1_000_000))

tkoolen · July 4, 2018, 9:14pm

You’re mostly measuring compilation time this way, but there is an actual runtime performance issue.

Try

using BenchmarkTools
@btime f(x) setup = x = zeros(1_000_000)

Result on latest nightly: 297.076 ms (2998980 allocations: 61.02 MiB)
Result on 0.6.3: 812.711 μs (0 allocations: 0 bytes)

Not sure yet what the issue is.

Without the & (i < 1_000_000) almost the whole function is optimized away on 0.7, it seems.

Keno · July 4, 2018, 9:30pm

The compiler currently has some trouble tracking the extra conditional through the branch. I’d recommend either reversing the order of the conditions (and using && as suggested by @bkamins above) or preferably not using the raw iteration protocol directly (e.g. this particular case is equivalent to Iterators.take(x, iter) .

foobar_lv2 · July 5, 2018, 12:58pm

I don’t get it. Why is inference capable of understanding one variant, but not the other? Why are there no phi-nodes in the lowered code? Or is phi-nodification of the lowering an ongoing unfinished project? (sorry for the naive question on the state of the julia-side of the compiler)

julia> a=rand(1000); 
julia> function g_good(iter)
              i=0
              x=iterate(iter)
              @label loopentry
              if true && x!==nothing
              i += 1 
              v,state = x
              x=iterate(iter,state)
              @goto loopentry
              end
              i
              end

julia> @code_lowered g_good(a)
CodeInfo(
2  1 ─       Core.NewvarNode(:(v))                                                                                                  │
   │         Core.NewvarNode(:(state))                                                                                              │
   │         Core.NewvarNode(:(#temp#@_6))                                                                                          │
   │         i = 0                                                                                                                  │
3  └──       x = (Main.iterate)(Core.Compiler.Argument(2))                                                                          │
5  2 ┄       goto 4 if not true                                                                                                     │
   3 ─       #temp#@_8 = Core.Compiler.Argument(7) !== Main.nothing                                                                 │
   └──       goto 5                                                                                                                 │
   4 ─       #temp#@_8 = false                                                                                                      │
   5 ┄       goto 7 if not %%#temp#@_8                                                                                              │
6  6 ─       i = Core.Compiler.Argument(3) + 1                                                                                      │
7  │   %12 = Base.indexed_iterate(%%x, 1)::Any                                                                                      │
   │         v = (Core.getfield)(Core.SSAValue(12), 1)                                                                              │
   │         #temp#@_6 = (Core.getfield)(Core.SSAValue(12), 2)                                                                      │
   │   %15 = Base.indexed_iterate(%%x, 2, %%#temp#@_6)::Any                                                                         │
   │         state = (Core.getfield)(Core.SSAValue(15), 1)                                                                          │
   │         #temp#@_6 = (Core.getfield)(Core.SSAValue(15), 2)                                                                      │
8  │         x = (Main.iterate)(Core.Compiler.Argument(2), Core.Compiler.Argument(5))                                               │
9  └──       goto 2                                                                                                                 │
11 7 ─       return %%i                                                                                                             │
)

julia> function g_bad(iter)
              i=0
              x=iterate(iter)
              @label loopentry
              if  x!==nothing && true
              i += 1 
              v,state = x
              x=iterate(iter,state)
              @goto loopentry
              end
              i
              end
julia> @code_lowered g_bad(a)
CodeInfo(
2  1 ─       Core.NewvarNode(:(v))                                                                                                  │
   │         Core.NewvarNode(:(state))                                                                                              │
   │         Core.NewvarNode(:(#temp#@_6))                                                                                          │
   │         i = 0                                                                                                                  │
3  └──       x = (Main.iterate)(Core.Compiler.Argument(2))                                                                          │
5  2 ┄ %6  = Main.:!==(%%x, Main.nothing)::Any                                                                                      │
   └──       goto 4 if not %6                                                                                                       │
   3 ─       #temp#@_8 = true                                                                                                       │
   └──       goto 5                                                                                                                 │
   4 ─       #temp#@_8 = false                                                                                                      │
   5 ┄       goto 7 if not %%#temp#@_8                                                                                              │
6  6 ─       i = Core.Compiler.Argument(3) + 1                                                                                      │
7  │   %13 = Base.indexed_iterate(%%x, 1)::Any                                                                                      │
   │         v = (Core.getfield)(Core.SSAValue(13), 1)                                                                              │
   │         #temp#@_6 = (Core.getfield)(Core.SSAValue(13), 2)                                                                      │
   │   %16 = Base.indexed_iterate(%%x, 2, %%#temp#@_6)::Any                                                                         │
   │         state = (Core.getfield)(Core.SSAValue(16), 1)                                                                          │
   │         #temp#@_6 = (Core.getfield)(Core.SSAValue(16), 2)                                                                      │
8  │         x = (Main.iterate)(Core.Compiler.Argument(2), Core.Compiler.Argument(5))                                               │
9  └──       goto 2                                                                                                                 │
11 7 ─       return %%i                                                                                                             │
)

jebej · July 5, 2018, 2:21pm

If I switch the conditions and use &&, I get,

on v0.6:

julia> function f(iter)                                
          state = start(iter)                          
          i = 0                                        
          while (i < 1_000_000) && !(done(iter, state))
             i += 1                                    
             v, state = next(iter, state)              
          end; return i                                
       end                                             
f (generic function with 1 method)                     
                                                       
julia> @btime f(x) setup = x = zeros(1_000_000)        
  812.663 μs (0 allocations: 0 bytes)                  
1000000

and on 0.7

julia> function f(iter)
          x = iterate(iter)
          i = 0
          while (i < 1_000_000) && (x !== nothing)
            i += 1
             v, state = x
             x = iterate(iter, state)
          end; return i
       end
f (generic function with 1 method)

julia> @btime f(x) setup = x = zeros(1_000_000)
  541.775 μs (0 allocations: 0 bytes)
1000000

You need to return i because the whole iteration otherwise gets compiled out (on 0.7).

ExpandingMan · July 5, 2018, 2:47pm

I do really like the protocol itself, it’s very clean and simple for users, but I must say I am mildly terrified of performance traps. Issues like the one demonstrated here should be prominently documented. It wouldn’t be even close to obvious to me why @matthieu’s code is so incredibly slow on 0.7.

Out of curiosity, is there any hope of improvement on this particular issue before 1.0?

Keno · July 5, 2018, 3:52pm

We currently do a very limited form for backwards type inference at branch points, i.e. something like:

cond = x === nothing
if cond
   # x known to be nothing
end

This currently only works with === and only if that === is fairly directly passed into a branch and not obscured by further processing. There’s ways to do more general backwards inference, but they are not implemented yet.

SSA conversion currently happens after type inference, but before optimizations. In the future it may happen earlier, as part of lowering, but not at the moment.

Keno · July 5, 2018, 3:55pm

We needed to get the iteration protocol in because it’s a breaking change. There are various compiler improvements to be done still, but none of them are breaking, so they’re not priorities for 1.0. Until, then we need to be continue being a bit careful with union types in performance sensitive code (just as we need to be careful about type stability). The general guidelines here is that Union{T, Nothing} should be compared using === (or !===), and that the result should be directly used as a condition. In the future the compiler will likely better at this, but for now you’ll need to follow the guidelines for performance. Also, I continue to recommend using some of the higher level iterator abstractions. I know using the raw iteration protocol is tempting, but there are pitfalls (as there were with the old protocol).

StefanKarpinski · July 5, 2018, 4:56pm

It is fairly typical that new features have various optimization pitfalls initially which get filled in over time as people report problematic cases and the compiler is improved to support such cases. After some iteration on this the feature becomes reliably fast. This process takes a little time but happens faster than you might think and soon you’ll forget that the performance was ever unreliable. Bottom line: give it some time—the new iteration protocol is already better than the old one was initially (which of course, hardly anyone remembers).

jeff.bezanson · July 5, 2018, 9:47pm

Please file an issue about this, and hopefully the performance discrepancy can be fixed in a 1.0.x release or before.

Azamat · November 24, 2018, 5:27am

Was issue filed here? I can file one otherwise.

StefanKarpinski · November 24, 2018, 3:52pm

Seems fine on master:

julia> @btime f(zeros(1_000_000))
  1.182 ms (2 allocations: 7.63 MiB)

Topic		Replies	Views
Question about performance when iterating New to Julia	3	448	March 18, 2020
For loop Performance Performance question	8	1650	January 11, 2020
Very slow array creation (and Benchmark iterate error) Performance	6	1091	May 15, 2019
1.0 progress/status Internals & Design	60	19923	April 20, 2019
Seemingly large permformance regression on "^" power operator in >=1.6 General Usage bug , performance	2	409	April 28, 2021

New iteration protocol much slower

Related topics