Inspired by this topic here: map vs loops vs broadcasts, I wanted to do my own tests on the speed of the functions. As it turns out the optimised for loop, map and vectorisation are all roughly in the same ballpark.
Now I wanted to play the same game with a predefined result array and found significant differences:
using BenchmarkTools
function forloop(e, v)
@simd for i in eachindex(v)
@inbounds e[i] = 2*v[i]^2 + v[i] + 5
end
end
fmap(e, v) = map!(x -> 2x^2 + x + 5, e, v)
fbcs(e, v) = @. e = 2*v^2 + v + 5
v = rand(10000)
e = similar(v)
@btime for i in 1:100
forloop(e, v)
end
@btime for i in 1:100
fmap(e, v)
end
@btime for i in 1:100
fbcs(e, v)
end
It’s not necessary to put a loop around the code you want to benchmark (@btime already does that for you)
It is necessary to interpolate (with $) any variables you use inside the code you are benchmarking. Otherwise you’re timing the lookup of a global variable at each function call, which will affect your results (see GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language ). I wouldn’t expect it to affect the relative performance in this case, but it’s still worth getting right.
Thanks for the advice regarding the interpolation and the loop. I will certainly use this in my next benchmark run. As you have already shown this does not affect relative performance, so I will keep my implementation in the post as is.
If the performance bottleneck of this is figured out should one open a github issue?
While wondering if I should use map or for-loop in a program, my elementary test below showed that map was much slower than for-loop, even when I used no-check-bounds kernel option in IJulia by doing
julia> using IJulia
julia> installkernel("Julia no-check-bounds", "--check-bounds=no")