Testing one loop with two operations or a two loops

I was poking around a few discussion forums about what the best coding practice is. I have a few operations that I put into one loop, similar to

using BenchmarkTools

function single(A::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
    @inbounds A[i] *= 3
  end
end

function double(A::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
  end
  @simd for i = 1:length(A)
    @inbounds A[i] *= 3
  end
end

m = 10_000_000
A = Array{Int64,1}(undef,m)

@btime single(A)
@btime double(A)

The second function double performs the same computation but in two loops.

My question is whether it is always better to put two operations into a single loops (single) or when it is sometimes better to separate them (double), especially as it relates to Julia.

One school of thought is that the compiler can optimize the loop structure in terms of memory so that it would be more efficient with single, and when I perform the time test that appears to be the case:

julia> @btime single(A)
    6.333 ms (0 allocations: 0 bytes)
julia> @btime double(A)
   13.508 ms (0 allocations: 0 bytes)

which is a factor of 2 faster and implies that there is some efficient use of memory that is being sorted out by the compiler.

A more complicated case where the memory is not so clean would be

using BenchmarkTools

function single(A::Array{Int64,1},B::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
    @inbounds B[i] = 3
  end
end

function double(A::Array{Int64,1},B::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
  end
  @simd for i = 1:length(B)
    @inbounds B[i] = 3
  end
end

m = 10_000_000
A = Array{Int64,1}(undef,m)
B = Array{Int64,1}(undef,m)

@btime single(A,B)
@btime double(A,B)

giving

julia> @btime single(A)
    11.617 ms (0 allocations: 0 bytes)
julia> @btime double(B)
   12.624 ms (0 allocations: 0 bytes)

which is nearly similar between the two functions. I assume that the compiler’s operations on the memory make this a more challenging example because the second vector is more distant in memory, but I was expecting that the double example would be faster, prompting me to ask what the general rules are for using one or two loops.

I didn’t find anything specific to Julia when I looked (only python and java…and those had conflicting answers on what the best general practice is). Am I missing anything here? Are there any special considerations for Julia? Is it best to always just try it in more complicated code? Is the time difference always so minimal that this doesn’t matter?

These are really bad benchmarks, since in both cases, the compiler could really easily (in theory) fuse operations to ignore the first one. (ie compile the first into A[i] =6). That said, in general 1 loop will be better since you will only be passing over the memory once. Also, in Julia, broadcasting is often an idiomatic way of expressing these types of operations more cleanly (and with the same performance).

4 Likes

Yeah, I think this is the important point–due to caching, it is generally more efficient to make a single pass over an array rather than multiple passes because it requires loading the array into the cache only once rather than multiple times. This basic principle is not Julia-specific in any way–the same rule of thumb should apply in any fast language running on any modern CPU.

8 Likes

This is known as improving the temporal locality of your program.

4 Likes