Comparing performance of 2 simple averaging functions - why is one faster?

lwhitefox · August 31, 2020, 6:16pm

Below I include code for two simple functions that compute averages of pairs of adjacent values in an input 1D vector. Function f2 is almost twice as fast as f3 - I’d like to understand why! If anything, I would have expected them to be the same or for f3 to be better, since I’d think the Julia system would automagically be able to optimize a vector function with lots of dots. Allocations are the same, so there is something else going on.

If I try @code_llvm the output for f3 is extremely long though…

julia> function f2(d::Array{Float64,1})
           [ (d[i]+d[i+1])/2.0 for i in 1:length(d)-1]
       end
f2 (generic function with 1 method)

julia> function f3(d::Array{Float64,1})
           (d[1:end-1] .+ d[2:end]) ./= 2.0
       end
f3 (generic function with 1 method)

julia> a = [i for i in 1.0:100.0];

julia> using BenchmarkTools

julia> @benchmark f2($a)
BenchmarkTools.Trial: 
  memory estimate:  944 bytes
  allocs estimate:  3
  --------------
  minimum time:     155.671 ns (0.00% GC)
  median time:      182.914 ns (0.00% GC)
  mean time:        191.812 ns (5.42% GC)
  maximum time:     1.047 μs (72.99% GC)
  --------------
  samples:          10000
  evals/sample:     800

julia> @benchmark f3($a)
BenchmarkTools.Trial: 
  memory estimate:  2.63 KiB
  allocs estimate:  3
  --------------
  minimum time:     223.421 ns (0.00% GC)
  median time:      304.436 ns (0.00% GC)
  mean time:        363.418 ns (6.18% GC)
  maximum time:     1.563 μs (77.07% GC)
  --------------
  samples:          10000
  evals/sample:     535

Oscar_Smith · August 31, 2020, 6:20pm

d[1:end-1] and d[2:end] are both copies. If you instead write

function f4(d::Array{Float64,1})
    @views (d[1:end-1] .+ d[2:end]) ./= 2.0
end

it should be as fast. For a generic version, consider

function f4(d::Vector)
    two_T = one(eltype(d))+one(eltype(d))
    @views (d[1:end-1] .+ d[2:end]) ./= two_T =
end

it should be just as fast, but will work for arbitrarily typed Vectors

lwhitefox · August 31, 2020, 6:28pm

Ah, perfect - thanks. This is really helpful, because I’ve also been trying to work to learn how to intelligently use views so your post helps me learn two things at the same time!

the behavior of functions f4 (correcting the typo “=” sign on the second) is interesting on my machine - the median benchmark time is consistently 25% faster than for f2, but the mean for f2 is a bit better (just a few percent though).

DNF · August 31, 2020, 6:51pm

I’m pretty sure it’s no better than just dividing by the integer 2. Do you have any example where 2 falls short?

mcabbott · August 31, 2020, 7:04pm

I think this means that the division by two has to be done on a separate pass, acting in-place on the newly created array, instead of being done along with the .+. A bigger cost at larger sizes, but visible here:

julia> @btime @views ($a[1:end-1] .+ $a[2:end]) ./= 2;
  92.228 ns (1 allocation: 896 bytes)

julia> @btime @views ($a[1:end-1] .+ $a[2:end]) ./ 2;
  72.519 ns (1 allocation: 896 bytes)

julia> @btime @views ($a[1:end-1] .+ $a[2:end]) .* (1/2);
  72.702 ns (1 allocation: 896 bytes)

Also, f2 benefits from @inbounds:

julia> @btime [ ($a[i]+$a[i+1])/2 for i in 1:length($a)-1 ];
  135.400 ns (1 allocation: 896 bytes)

julia> @btime [ @inbounds($a[i]+$a[i+1])/2 for i in 1:length($a)-1 ];
  78.197 ns (1 allocation: 896 bytes)

lwhitefox · August 31, 2020, 7:06pm

Good tip - thanks. The other examples are insightful too!

Topic		Replies	Views
Broadcast vs. scalar loop, can Julia vectorize better? Internals & Design	8	1923	February 15, 2020
Optimizing Calculation in Julia compared to C (New to Julia) Performance	25	2531	January 1, 2020
Element-wise additions behaviour Performance	9	402	January 30, 2020
Understanding why one function is faster than the other New to Julia	16	1202	November 1, 2020
Benchmarking questions General Usage benchmark	3	327	September 18, 2023

Comparing performance of 2 simple averaging functions - why is one faster?

Related topics