# Missing iteration invariant optimization?

I have a feeling that iterators are harmful. Consider:

``````function baz1(n)
for i=1:n
i <1 && return 1
i>n &&  return 2
end
0
end

function baz2(n)
I = 1:n
state = start(I)
while !done(I, state)
(i, state) = next(I, state)
i <1 && return 1
i>n &&  return 2
end
0
end

function baz3(n)
i=1
while i<n
i <1 && return 1
i>n &&  return 2
i +=1
end
0
end
``````

Now, baz1 and baz2 are equivalent according to docs (and tested with @code_llvm on both 0.6 and 0.7 nightly). Letâ€™s look at the code:

``````julia> @code_llvm baz1(10)

; Function baz1
; Location: REPL[7]:2
define i64 @julia_baz1_63294(i64) {
top:
%1 = icmp sgt i64 %0, 0
%2 = select i1 %1, i64 %0, i64 0
%3 = add i64 %2, 1
br label %L3

L3:                                               ; preds = %L12, %top
%"#temp#.0" = phi i64 [ 1, %top ], [ %6, %L12 ]
%4 = icmp eq i64 %"#temp#.0", %3
br i1 %4, label %L18, label %if

if:                                               ; preds = %L3
; Location: REPL[7]:3
%5 = icmp sgt i64 %"#temp#.0", 0
br i1 %5, label %L12, label %L18

L18:                                              ; preds = %L12, %if, %L3
%merge = phi i64 [ 0, %L3 ], [ 1, %if ], [ 2, %L12 ]
; Location: REPL[7]:6
ret i64 %merge

L12:                                              ; preds = %if
; Location: REPL[7]:2
%6 = add nuw i64 %"#temp#.0", 1
; Location: REPL[7]:4
%7 = icmp sgt i64 %"#temp#.0", %0
br i1 %7, label %L18, label %L3
}
``````

On the other hand,

``````julia> @code_llvm baz3(10)

; Function baz3
; Location: REPL[3]:2
define i64 @julia_baz3_63295(i64) {
top:
; Location: REPL[3]:3
br i1 undef, label %L17, label %L17

L17:                                              ; preds = %top, %top
; Location: REPL[3]:8
ret i64 0
}
``````

The relevance to bounds checks, and hence general performance, is obvious. I think something must be done to help llvm in â€śfor i=a:bâ€ť loops, be it more annotations, a special case for iterators over unit ranges, or a new llvm pass.

Is there a way to obtain a better code_llvm that contains all optimization metadata (all the align / noalias / etc annotations)? Is there a way to obtain code_llvm at each pass?

Big oops. baz3 is of course not equivalent to baz1 and baz2, the correct code is

``````julia> function baz4(n)
i=1
while i<=n
i <1 && return 1
i>n &&  return 2
i +=1
end
0
end

julia> @code_llvm baz4(10)

; Function baz4
; Location: REPL[16]:2
define i64 @julia_baz4_63322(i64) {
top:
; Location: REPL[16]:3
%1 = icmp slt i64 %0, 1
br i1 %1, label %L17, label %if.preheader

; Location: REPL[16]:4
br label %if

L3:                                               ; preds = %if
; Location: REPL[16]:6
%2 = add nuw i64 %i.03, 1
; Location: REPL[16]:3
%3 = icmp sgt i64 %2, %0
br i1 %3, label %L17.loopexit, label %if

if:                                               ; preds = %if.preheader, %L3
%i.03 = phi i64 [ %2, %L3 ], [ 1, %if.preheader ]
; Location: REPL[16]:4
%4 = icmp sgt i64 %i.03, 0
br i1 %4, label %L3, label %L17.loopexit

L17.loopexit:                                     ; preds = %if, %L3
%merge.ph = phi i64 [ 1, %if ], [ 0, %L3 ]
; Location: REPL[16]:8
br label %L17

L17:                                              ; preds = %L17.loopexit, %top
%merge = phi i64 [ 0, %top ], [ %merge.ph, %L17.loopexit ]
ret i64 %merge
}
``````

which manages to remove only case 2. Still, there should be a way to get these simple loop invariants figured out.

Edit: â€śgcc -O3â€ť figures this out in C; â€śclang -O3â€ť doesnâ€™t (tested clang 5.0 and gcc 7.2.0).

The title was rather overblown so Iâ€™ve edited it to be a bit more restrained. Since `clang` does not do this optimization itâ€™s not too surprising that Julia doesnâ€™t either â€“ itâ€™s LLVM either way. This isnâ€™t a fundamental issue with iterators though, itâ€™s just a missing optimization in LLVM. My guess is that since this kind of optimization is quite natural to do â€śby handâ€ť (i.e. notice that the branch conditions canâ€™t be met and delete them entirely) it hasnâ€™t been a pressing matter for LLVM. Do you have a use case where this ends up being a problem?

1 Like

Re title: sorry for the bad vibes; thanks for correcting this.

``````using BenchmarkTools
function enumsumtest(V)
s=0
for i=1:length(V)
for (j,v) = enumerate(V)
s+=v
end
end
s
end

function ib_enumsumtest(V)
s=0
@inbounds for i=1:length(V)
for (j,v) in enumerate(V)
s+=v
end
end
s
end
``````
``````V = Vector{Int}(5_000);
@btime enumsumtest(V)
21.564 ms (1 allocation: 16 bytes)

@btime ib_enumsumtest(V)
3.009 ms (1 allocation: 16 bytes)
``````

In order to optimize out the bounds-checks, there are actually three issues: First, the enumerate confuses llvm (baz1/baz2 vs baz4); second, llvm somehow fails at something (baz3 vs baz4), and third llvm fails to track the length of the vector (it thinks the length is volatile?).

Consider:

``````function enumsumtest_wrong_hint(V)
s=0
n=length(V)
for i=1:n
j=1
while j<n
j < 1 && return 1
j > n && return 2
@inbounds s+=V[j]
j += 1
end
end
s
end
@btime enumsumtest_wrong_hint(V)
3.052 ms (1 allocation: 16 bytes)
``````

which is tantalizingly close to what we want!

On the other hand,

``````function enumsumtest_wrong_nohint(V)
s=0
n=length(V)
for i=1:n
j=1
while j<length(V)
j < 1 && return 1
j > length(V) && return 2
@inbounds s+=V[j]
j += 1
end
end
s
end
@btime enumsumtest_wrong_nohint(V)
19.277 ms (1 allocation: 16 bytes)
``````

Edit: I rewrote this and opened an issue https://github.com/JuliaLang/julia/issues/24925

1 Like

Something that should give us hope for getting rid of most boundschecks:

The following correctly vectorizes on 0.6 and emits no boundschecks (even though there is no @inbounds):

``````function foo_x(V::Vector{T}) where {T}
s=zero(T)
state = 0
while reinterpret(UInt64,state) <  reinterpret(UInt64, size(V,1))
state += 1
v =V[state]
s += v
end
return s
end
``````