# Comparing performance of 2 simple averaging functions - why is one faster?

Below I include code for two simple functions that compute averages of pairs of adjacent values in an input 1D vector. Function `f2` is almost twice as fast as `f3` - I’d like to understand why! If anything, I would have expected them to be the same or for `f3` to be better, since I’d think the Julia system would automagically be able to optimize a vector function with lots of dots. Allocations are the same, so there is something else going on.

If I try `@code_llvm` the output for f3 is extremely long though…

``````julia> function f2(d::Array{Float64,1})
[ (d[i]+d[i+1])/2.0 for i in 1:length(d)-1]
end
f2 (generic function with 1 method)

julia> function f3(d::Array{Float64,1})
(d[1:end-1] .+ d[2:end]) ./= 2.0
end
f3 (generic function with 1 method)

julia> a = [i for i in 1.0:100.0];

julia> using BenchmarkTools

julia> @benchmark f2(\$a)
BenchmarkTools.Trial:
memory estimate:  944 bytes
allocs estimate:  3
--------------
minimum time:     155.671 ns (0.00% GC)
median time:      182.914 ns (0.00% GC)
mean time:        191.812 ns (5.42% GC)
maximum time:     1.047 μs (72.99% GC)
--------------
samples:          10000
evals/sample:     800

julia> @benchmark f3(\$a)
BenchmarkTools.Trial:
memory estimate:  2.63 KiB
allocs estimate:  3
--------------
minimum time:     223.421 ns (0.00% GC)
median time:      304.436 ns (0.00% GC)
mean time:        363.418 ns (6.18% GC)
maximum time:     1.563 μs (77.07% GC)
--------------
samples:          10000
evals/sample:     535
``````

`d[1:end-1]` and `d[2:end]` are both copies. If you instead write

``````function f4(d::Array{Float64,1})
@views (d[1:end-1] .+ d[2:end]) ./= 2.0
end
``````

it should be as fast. For a generic version, consider

``````function f4(d::Vector)
two_T = one(eltype(d))+one(eltype(d))
@views (d[1:end-1] .+ d[2:end]) ./= two_T =
end
``````

it should be just as fast, but will work for arbitrarily typed `Vectors`

3 Likes

Ah, perfect - thanks. This is really helpful, because I’ve also been trying to work to learn how to intelligently use `views` so your post helps me learn two things at the same time!

the behavior of functions `f4` (correcting the typo “=” sign on the second) is interesting on my machine - the median benchmark time is consistently 25% faster than for `f2`, but the mean for `f2` is a bit better (just a few percent though).

I’m pretty sure it’s no better than just dividing by the integer `2`. Do you have any example where `2` falls short?

1 Like

I think this means that the division by two has to be done on a separate pass, acting in-place on the newly created array, instead of being done along with the `.+`. A bigger cost at larger sizes, but visible here:

``````julia> @btime @views (\$a[1:end-1] .+ \$a[2:end]) ./= 2;
92.228 ns (1 allocation: 896 bytes)

julia> @btime @views (\$a[1:end-1] .+ \$a[2:end]) ./ 2;
72.519 ns (1 allocation: 896 bytes)

julia> @btime @views (\$a[1:end-1] .+ \$a[2:end]) .* (1/2);
72.702 ns (1 allocation: 896 bytes)
``````

Also, `f2` benefits from `@inbounds`:

``````julia> @btime [ (\$a[i]+\$a[i+1])/2 for i in 1:length(\$a)-1 ];
135.400 ns (1 allocation: 896 bytes)

julia> @btime [ @inbounds(\$a[i]+\$a[i+1])/2 for i in 1:length(\$a)-1 ];
78.197 ns (1 allocation: 896 bytes)
``````
1 Like

Good tip - thanks. The other examples are insightful too!