Generally, the answer is no, because the loop will only iterate if it isn’t empty anyway.
It may help to look at the code the compiler generates:
julia> y = Float64[];
julia> function mysum(x)
s = zero(eltype(x))
for xᵢ ∈ x
s += xᵢ
end
s
end
mysum (generic function with 1 method)
julia> @code_native debuginfo=:none syntax=:intel mysum(x)
.text
mov rdx, qword ptr [rdi + 8]
test rdx, rdx
je L62
mov rax, qword ptr [rdi]
vxorpd xmm0, xmm0, xmm0
vaddsd xmm0, xmm0, qword ptr [rax]
cmp rdx, 1
je L61
cmp rdx, 2
mov ecx, 2
cmova rcx, rdx
mov edx, 1
nop dword ptr [rax]
L48:
vaddsd xmm0, xmm0, qword ptr [rax + 8*rdx]
inc rdx
cmp rcx, rdx
jne L48
L61:
ret
L62:
vxorps xmm0, xmm0, xmm0
ret
nop word ptr cs:[rax + rax]
The thing to look at here is the start of the function:
mov rdx, qword ptr [rdi + 8]
test rdx, rdx
je L62
It loads the length into the rdx
register, and then tests it against itself. Test performs &
.
Next, if the test returned 0
, it je
(jumps) to L62
. Bitwise &
-ing a number with itself only returns 0 if the number itself is zero, so this is a check for if the array is empty, and then immediately a jump.
Looking at L62
:
L62:
vxorps xmm0, xmm0, xmm0
ret
nop word ptr cs:[rax + rax]
It xor
s a register with itself to set it to zero (because the function just returns 0
if empty), and returns.
What if we added the check?
julia> @code_native debuginfo=:none syntax=:intel mysum_check(x)
.text
mov rax, qword ptr [rdi + 8]
test rax, rax
je L46
mov rcx, qword ptr [rdi]
vxorpd xmm0, xmm0, xmm0
vaddsd xmm0, xmm0, qword ptr [rcx]
cmp rax, 1
je L45
mov edx, 1
nop
L32:
vaddsd xmm0, xmm0, qword ptr [rcx + 8*rdx]
inc rdx
cmp rax, rdx
jne L32
L45:
ret
L46:
vxorps xmm0, xmm0, xmm0
ret
nop word ptr cs:[rax + rax]
This code is a little shorter, because in the previous version it first added a single number to 0
, checked if the loop was of length 1
, and immediately returned if so. But here it instead goes straight into the loop.
I wouldn’t worry too much about that difference…
Consider that a CPU running at 4 GHz has 4 clock cycles per nanosecond, and there are 1 billion nanoseconds per second.
In each clock cycle, a modern CPU can often execute 3 ore more instructions.
So whatever difference this makes, it’ll be negligible.
Maybe I’m crazy enough to worry about every stray instruction, but also note that less isn’t necessarily better. As an extreme example, Julia’s broadcasting generates a lot of code that’s fast for different possibilities (e.g., different particular sizes being 1), plus a few checks to run whatever is fastest.