Testing one loop with two operations or a two loops

swishmas · March 11, 2021, 4:30pm

I was poking around a few discussion forums about what the best coding practice is. I have a few operations that I put into one loop, similar to

using BenchmarkTools

function single(A::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
    @inbounds A[i] *= 3
  end
end

function double(A::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
  end
  @simd for i = 1:length(A)
    @inbounds A[i] *= 3
  end
end

m = 10_000_000
A = Array{Int64,1}(undef,m)

@btime single(A)
@btime double(A)

The second function double performs the same computation but in two loops.

My question is whether it is always better to put two operations into a single loops (single) or when it is sometimes better to separate them (double), especially as it relates to Julia.

One school of thought is that the compiler can optimize the loop structure in terms of memory so that it would be more efficient with single, and when I perform the time test that appears to be the case:

julia> @btime single(A)
    6.333 ms (0 allocations: 0 bytes)
julia> @btime double(A)
   13.508 ms (0 allocations: 0 bytes)

which is a factor of 2 faster and implies that there is some efficient use of memory that is being sorted out by the compiler.

A more complicated case where the memory is not so clean would be

using BenchmarkTools

function single(A::Array{Int64,1},B::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
    @inbounds B[i] = 3
  end
end

function double(A::Array{Int64,1},B::Array{Int64,1})
  @simd for i = 1:length(A)
    @inbounds A[i] = 2
  end
  @simd for i = 1:length(B)
    @inbounds B[i] = 3
  end
end

m = 10_000_000
A = Array{Int64,1}(undef,m)
B = Array{Int64,1}(undef,m)

@btime single(A,B)
@btime double(A,B)

giving

julia> @btime single(A)
    11.617 ms (0 allocations: 0 bytes)
julia> @btime double(B)
   12.624 ms (0 allocations: 0 bytes)

which is nearly similar between the two functions. I assume that the compiler’s operations on the memory make this a more challenging example because the second vector is more distant in memory, but I was expecting that the double example would be faster, prompting me to ask what the general rules are for using one or two loops.

I didn’t find anything specific to Julia when I looked (only python and java…and those had conflicting answers on what the best general practice is). Am I missing anything here? Are there any special considerations for Julia? Is it best to always just try it in more complicated code? Is the time difference always so minimal that this doesn’t matter?

Oscar_Smith · March 11, 2021, 4:54pm

These are really bad benchmarks, since in both cases, the compiler could really easily (in theory) fuse operations to ignore the first one. (ie compile the first into A[i] =6). That said, in general 1 loop will be better since you will only be passing over the memory once. Also, in Julia, broadcasting is often an idiomatic way of expressing these types of operations more cleanly (and with the same performance).

rdeits · March 11, 2021, 5:04pm

Yeah, I think this is the important point–due to caching, it is generally more efficient to make a single pass over an array rather than multiple passes because it requires loading the array into the cache only once rather than multiple times. This basic principle is not Julia-specific in any way–the same rule of thumb should apply in any fast language running on any modern CPU.

stevengj · March 11, 2021, 6:23pm

This is known as improving the temporal locality of your program.

Topic		Replies	Views
Operating on many arrays of the same size: one loop, many loops, or broadcasting? Performance question , memory , loops	3	915	March 27, 2020
Why fewer memory allocations does not necessarily suggest higher speed New to Julia performance , memory-allocation	5	792	June 6, 2021
Comparison between Loops and Vectorization in Julia General Usage	3	548	May 3, 2024
Faster sum loop when looping through vector multiple times than once Performance	5	291	September 29, 2023
Efficient looping statement in Julia Performance	15	5608	January 10, 2019

Testing one loop with two operations or a two loops

Related topics