This really is maddening to me, because in the code you’ve posted above, the third example does not have loop dependencies as far as I can tell That may be different in your real code, but it isn’t in what you posted here. Why should it not be allowed to unroll the third function?
Because it isn’t doing any work:
code_llvm
julia> @code_llvm h(ops_s, 4,7)
; @ REPL[2]:1 within `h`
define void @julia_h_403({}* nonnull align 16 dereferenceable(40) %0, i64 signext %1, i64 signext %2) #0 {
top:
; @ REPL[2]:2 within `h`
; ┌ @ array.jl:809 within `iterate` @ array.jl:809
; │┌ @ array.jl:215 within `length`
%3 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }*
%4 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %3, i64 0, i32 1
%5 = load i64, i64* %4, align 8
; │└
; │┌ @ int.jl:477 within `<` @ int.jl:470
%.not = icmp eq i64 %5, 0
; │└
br i1 %.not, label %L53, label %L9
L9: ; preds = %top
; │┌ @ array.jl:835 within `getindex`
%6 = bitcast {}* %0 to {}***
%7 = load {}**, {}*** %6, align 8
%8 = load {}*, {}** %7, align 8
%.not13 = icmp eq {}* %8, null
br i1 %.not13, label %fail, label %L18
L18: ; preds = %L9
; └└
; @ REPL[2]:10 within `h`
; ┌ @ array.jl:809 within `iterate`
; │┌ @ int.jl:477 within `<` @ int.jl:470
%.not1417 = icmp ugt i64 %5, 1
; │└
br i1 %.not1417, label %L42, label %L53
L20: ; preds = %L42
; │┌ @ int.jl:87 within `+`
%9 = add nuw i64 %value_phi418, 1
; │└
; │┌ @ int.jl:477 within `<` @ int.jl:470
%exitcond.not = icmp eq i64 %value_phi418, %5
; │└
br i1 %exitcond.not, label %L53, label %L42
L42: ; preds = %L20, %L18
%10 = phi i64 [ %value_phi418, %L20 ], [ 1, %L18 ]
%value_phi418 = phi i64 [ %9, %L20 ], [ 2, %L18 ]
; │┌ @ array.jl:835 within `getindex`
%11 = getelementptr inbounds {}*, {}** %7, i64 %10
%12 = load {}*, {}** %11, align 8
%.not15 = icmp eq {}* %12, null
br i1 %.not15, label %fail5, label %L20
L53: ; preds = %L20, %L18, %top
; └└
ret void
fail: ; preds = %L9
; @ REPL[2]:2 within `h`
; ┌ @ array.jl:809 within `iterate` @ array.jl:809
; │┌ @ array.jl:835 within `getindex`
call void @jl_throw({}* inttoptr (i64 140332589220768 to {}*))
unreachable
fail5: ; preds = %L42
; └└
; @ REPL[2]:10 within `h`
; ┌ @ array.jl:809 within `iterate`
; │┌ @ array.jl:835 within `getindex`
call void @jl_throw({}* inttoptr (i64 140332589220768 to {}*))
unreachable
; └└
}
L20
to L42
is the main loop body, which loads an element, compares it with null
(if my LLVM reading isn’t too rusty) and then jumps right back to the start of the loop or exits.
code_native
julia> @code_native h(ops_s, 4,7)
.text
; ┌ @ REPL[2]:1 within `h`
subq $8, %rsp
; │ @ REPL[2]:2 within `h`
; │┌ @ array.jl:809 within `iterate` @ array.jl:809
; ││┌ @ array.jl:215 within `length`
movq 8(%rdi), %rax
; ││└
; ││┌ @ int.jl:477 within `<` @ int.jl:470
testq %rax, %rax
; ││└
je L63
; ││┌ @ array.jl:835 within `getindex`
movq (%rdi), %rcx
cmpq $0, (%rcx)
je L87
; │└└
; │ @ REPL[2]:10 within `h`
; │┌ @ array.jl:809 within `iterate`
; ││┌ @ int.jl:477 within `<` @ int.jl:470
cmpq $2, %rax
; ││└
jb L63
; │└
; │┌ @ array.jl within `iterate`
movl $1, %edx
nopw %cs:(%rax,%rax)
; │└
; │┌ @ array.jl:809 within `iterate`
; ││┌ @ array.jl:835 within `getindex`
L48:
cmpq $0, (%rcx,%rdx,8)
je L65
; ││└
; ││┌ @ int.jl:477 within `<` @ int.jl:470
incq %rdx
cmpq %rdx, %rax
; ││└
jne L48
; │└
L63:
popq %rax
retq
; │┌ @ array.jl:809 within `iterate`
; ││┌ @ array.jl:835 within `getindex`
L65:
movabsq $jl_throw, %rax
movabsq $jl_system_image_data, %rdi
callq *%rax
; │└└
; │ @ REPL[2]:2 within `h`
; │┌ @ array.jl:809 within `iterate` @ array.jl:809
; ││┌ @ array.jl:835 within `getindex`
L87:
movabsq $jl_throw, %rax
movabsq $jl_system_image_data, %rdi
callq *%rax
nopl (%rax)
; └└└
Same goes for the native code, except here the loop body is called L48
. Compares with 0
, if it’s equal it exits, otherwise it increments a counter. Probably because the size of the input array is not fixed/known to the compiler, else it would elide that as well.
All this is possible because the result of the calculations isn’t stored anywhere. The compiler knows that the functions are sideeffect free and thus it just removes the call that quite literally doesn’t do anything.