Below I include code for two simple functions that compute averages of pairs of adjacent values in an input 1D vector. Function f2 is almost twice as fast as f3 - I’d like to understand why! If anything, I would have expected them to be the same or for f3 to be better, since I’d think the Julia system would automagically be able to optimize a vector function with lots of dots. Allocations are the same, so there is something else going on.
If I try @code_llvm the output for f3 is extremely long though…
julia> function f2(d::Array{Float64,1})
[ (d[i]+d[i+1])/2.0 for i in 1:length(d)-1]
end
f2 (generic function with 1 method)
julia> function f3(d::Array{Float64,1})
(d[1:end-1] .+ d[2:end]) ./= 2.0
end
f3 (generic function with 1 method)
julia> a = [i for i in 1.0:100.0];
julia> using BenchmarkTools
julia> @benchmark f2($a)
BenchmarkTools.Trial:
memory estimate: 944 bytes
allocs estimate: 3
--------------
minimum time: 155.671 ns (0.00% GC)
median time: 182.914 ns (0.00% GC)
mean time: 191.812 ns (5.42% GC)
maximum time: 1.047 μs (72.99% GC)
--------------
samples: 10000
evals/sample: 800
julia> @benchmark f3($a)
BenchmarkTools.Trial:
memory estimate: 2.63 KiB
allocs estimate: 3
--------------
minimum time: 223.421 ns (0.00% GC)
median time: 304.436 ns (0.00% GC)
mean time: 363.418 ns (6.18% GC)
maximum time: 1.563 μs (77.07% GC)
--------------
samples: 10000
evals/sample: 535
Ah, perfect - thanks. This is really helpful, because I’ve also been trying to work to learn how to intelligently use views so your post helps me learn two things at the same time!
the behavior of functions f4 (correcting the typo “=” sign on the second) is interesting on my machine - the median benchmark time is consistently 25% faster than for f2, but the mean for f2 is a bit better (just a few percent though).
I think this means that the division by two has to be done on a separate pass, acting in-place on the newly created array, instead of being done along with the .+. A bigger cost at larger sizes, but visible here:
julia> @btime [ ($a[i]+$a[i+1])/2 for i in 1:length($a)-1 ];
135.400 ns (1 allocation: 896 bytes)
julia> @btime [ @inbounds($a[i]+$a[i+1])/2 for i in 1:length($a)-1 ];
78.197 ns (1 allocation: 896 bytes)