I hope this is an OK category to put this under, I couldn’t see anything that matched better
I’m trying to get better at reading assembly output by looking at some interesting optimisations. I’m currently looking at the famous sum of n
integers optimisation. When I look at the sum of contiguous integers
julia> @code_native debuginfo=:none syntax=:intel sum(i for i in 1:1000)
.text
mov rcx, qword ptr [rdi]
mov rdx, qword ptr [rdi + 8]
mov rsi, rdx
sub rsi, rcx
jge L18
xor eax, eax
ret
L18:
lea rax, [rcx + 1]
imul rax, rsi
mov rdi, rcx
not rdi
add rdx, rdi
mulx rsi, rdx, rsi
shld rsi, rdx, 63
add rax, rcx
add rax, rsi
ret
nop word ptr cs:[rax + rax]
I can work through and figure out that the assembly is basically implementing the formula (stop^2 - start^2 + start + stop)/2
, which is exactly what I expect.
When I look at the assembly for a sum of strided integers, however, the assembly seems to be a normal loop
julia> @code_native debuginfo=:none syntax=:intel sum(i for i in 3:3:1000)
.text
mov rcx, qword ptr [rdi + 16]
mov rsi, qword ptr [rdi]
mov r8, qword ptr [rdi + 8]
test r8, r8
setg al
cmp rsi, rcx
setl dl
cmp rsi, rcx
je L35
xor al, dl
je L35
xor eax, eax
ret
L35:
lea rdi, [rsi + r8]
add rcx, r8
sub rcx, rsi
xor edx, edx
nop
L48:
mov rax, rsi
lea rsi, [rdi + rdx]
add rsi, rax
add rdx, r8
cmp rcx, rdx
jne L48
ret
nop word ptr cs:[rax + rax]
which I took to mean that LLVM just didn’t didn’t implement the optimisation for strided ranges. The strange thing is that benchmarking seems to show that this is actually a constant-time algorithm
julia> time(minimum(@benchmark sum(i for i in 3:3:10)))
1.636
julia> time(minimum(@benchmark sum(i for i in 3:3:10000)))
1.637
julia> time(minimum(@benchmark sum(i for i in 3:3:10000000)))
1.636
julia> time(minimum(@benchmark sum(i for i in 3:3:10000000000)))
1.636
but only when applied to native integers
julia> time(minimum(@benchmark sum(i for i in $(3:3:big(10)))))
307.8089430894309
julia> time(minimum(@benchmark sum(i for i in $(3:3:big(100)))))
4518.714285714285
julia> time(minimum(@benchmark sum(i for i in $(3:3:big(1000)))))
46339.0
julia> time(minimum(@benchmark sum(i for i in $(3:3:big(10000)))))
467333.0
So what’s happening here? Is @code_native
showing the wrong output, or is the loop just so fast that it appears to be constant time?
Edit: Should probably mention:
julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)