Is it worth it to check if a collection is empty before iterating?

Is it worth checking if a collection is empty before iterating? That is, is there any argument for preferring option 2 in the following example?

# Collection that may or may not be empty
data = filter(x -> x > 0.75, rand(4))

# Option 1
for i in data
      println("found one")
end

# Option 2
if !isempty(data)
      for i in data
            println("found one")
      end
end

I don’t think so. (there might be some weird type where option 2 is faster, but I kind of doubt it).

Generally, the answer is no, because the loop will only iterate if it isn’t empty anyway.
It may help to look at the code the compiler generates:

julia> y = Float64[];

julia> function mysum(x)
           s = zero(eltype(x))
           for xᵢ ∈ x
               s += xᵢ
           end
           s
       end
mysum (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel mysum(x)
        .text
        mov     rdx, qword ptr [rdi + 8]
        test    rdx, rdx
        je      L62
        mov     rax, qword ptr [rdi]
        vxorpd  xmm0, xmm0, xmm0
        vaddsd  xmm0, xmm0, qword ptr [rax]
        cmp     rdx, 1
        je      L61
        cmp     rdx, 2
        mov     ecx, 2
        cmova   rcx, rdx
        mov     edx, 1
        nop     dword ptr [rax]
L48:
        vaddsd  xmm0, xmm0, qword ptr [rax + 8*rdx]
        inc     rdx
        cmp     rcx, rdx
        jne     L48
L61:
        ret
L62:
        vxorps  xmm0, xmm0, xmm0
        ret
        nop     word ptr cs:[rax + rax]

The thing to look at here is the start of the function:

        mov     rdx, qword ptr [rdi + 8]
        test    rdx, rdx
        je      L62

It loads the length into the rdx register, and then tests it against itself. Test performs &.
Next, if the test returned 0, it je (jumps) to L62. Bitwise &-ing a number with itself only returns 0 if the number itself is zero, so this is a check for if the array is empty, and then immediately a jump.
Looking at L62:

L62:
        vxorps  xmm0, xmm0, xmm0
        ret
        nop     word ptr cs:[rax + rax]

It xors a register with itself to set it to zero (because the function just returns 0 if empty), and returns.

What if we added the check?

julia> @code_native debuginfo=:none syntax=:intel mysum_check(x)
        .text
        mov     rax, qword ptr [rdi + 8]
        test    rax, rax
        je      L46
        mov     rcx, qword ptr [rdi]
        vxorpd  xmm0, xmm0, xmm0
        vaddsd  xmm0, xmm0, qword ptr [rcx]
        cmp     rax, 1
        je      L45
        mov     edx, 1
        nop
L32:
        vaddsd  xmm0, xmm0, qword ptr [rcx + 8*rdx]
        inc     rdx
        cmp     rax, rdx
        jne     L32
L45:
        ret
L46:
        vxorps  xmm0, xmm0, xmm0
        ret
        nop     word ptr cs:[rax + rax]

This code is a little shorter, because in the previous version it first added a single number to 0, checked if the loop was of length 1, and immediately returned if so. But here it instead goes straight into the loop.

I wouldn’t worry too much about that difference…

Consider that a CPU running at 4 GHz has 4 clock cycles per nanosecond, and there are 1 billion nanoseconds per second.
In each clock cycle, a modern CPU can often execute 3 ore more instructions.
So whatever difference this makes, it’ll be negligible.

Maybe I’m crazy enough to worry about every stray instruction, but also note that less isn’t necessarily better. As an extreme example, Julia’s broadcasting generates a lot of code that’s fast for different possibilities (e.g., different particular sizes being 1), plus a few checks to run whatever is fastest.

5 Likes

Good to know! I ended up having to implement the test anyway because my function involves taking unions, giving rise to the following issue:

julia> X = Dict(4 => 7, 3 => 6, 1 => 3)
Dict{Int64,Int64} with 3 entries:
  4 => 7
  3 => 6
  1 => 3

julia> union(X...)                 # Works fine when X is nonempty
5-element Array{Int64,1}:
 4
 7
 3
 6
 1

julia> union(Dict()...)            # Error when X is empty
ERROR: MethodError: no method matching union()
Closest candidates are:
  union(::Pkg.Types.VersionSpec, ::Pkg.Types.VersionSpec) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\versions.jl:252
  union(::Base.OneTo, ::Base.OneTo) at range.jl:785
  union(::BitSet, ::Any...) at bitset.jl:312
  ...
Stacktrace:
 [1] top-level scope at REPL[10]:1

I would recommend doing

julia> X = Dict(4 => 7, 3 => 6, 1 => 3)
Dict{Int64, Int64} with 3 entries:
  4 => 7
  3 => 6
  1 => 3

julia> union(keys(X), values(X))
Set{Int64} with 5 elements:
  4
  6
  7
  3
  1

instead. The dynamic splat will generally be type unstable. Plus, if you’re working with unions, odds are getting a Set is preferable to getting a Vector/Array.
This also works when empty:

julia> Y = Dict();

julia> union(keys(Y), values(Y))
Set{Any}()
3 Likes

And if empty dictionaries are initialized with their types (which one should mostly do), then you also get the correct output container type:

julia> X = Dict{Int,Int}();

julia> union(keys(X), values(X))
Set{Int64}()
4 Likes