I was poking around a few discussion forums about what the best coding practice is. I have a few operations that I put into one loop, similar to
using BenchmarkTools
function single(A::Array{Int64,1})
@simd for i = 1:length(A)
@inbounds A[i] = 2
@inbounds A[i] *= 3
end
end
function double(A::Array{Int64,1})
@simd for i = 1:length(A)
@inbounds A[i] = 2
end
@simd for i = 1:length(A)
@inbounds A[i] *= 3
end
end
m = 10_000_000
A = Array{Int64,1}(undef,m)
@btime single(A)
@btime double(A)
The second function double
performs the same computation but in two loops.
My question is whether it is always better to put two operations into a single loops (single
) or when it is sometimes better to separate them (double
), especially as it relates to Julia.
One school of thought is that the compiler can optimize the loop structure in terms of memory so that it would be more efficient with single
, and when I perform the time test that appears to be the case:
julia> @btime single(A)
6.333 ms (0 allocations: 0 bytes)
julia> @btime double(A)
13.508 ms (0 allocations: 0 bytes)
which is a factor of 2 faster and implies that there is some efficient use of memory that is being sorted out by the compiler.
A more complicated case where the memory is not so clean would be
using BenchmarkTools
function single(A::Array{Int64,1},B::Array{Int64,1})
@simd for i = 1:length(A)
@inbounds A[i] = 2
@inbounds B[i] = 3
end
end
function double(A::Array{Int64,1},B::Array{Int64,1})
@simd for i = 1:length(A)
@inbounds A[i] = 2
end
@simd for i = 1:length(B)
@inbounds B[i] = 3
end
end
m = 10_000_000
A = Array{Int64,1}(undef,m)
B = Array{Int64,1}(undef,m)
@btime single(A,B)
@btime double(A,B)
giving
julia> @btime single(A)
11.617 ms (0 allocations: 0 bytes)
julia> @btime double(B)
12.624 ms (0 allocations: 0 bytes)
which is nearly similar between the two functions. I assume that the compiler’s operations on the memory make this a more challenging example because the second vector is more distant in memory, but I was expecting that the double
example would be faster, prompting me to ask what the general rules are for using one or two loops.
I didn’t find anything specific to Julia when I looked (only python and java…and those had conflicting answers on what the best general practice is). Am I missing anything here? Are there any special considerations for Julia? Is it best to always just try it in more complicated code? Is the time difference always so minimal that this doesn’t matter?