Identical functions repeated benchmarks show systematic differences

I think nonetheless your question raises a very good point. Most of the time the reason we use BenchmarkTools is not because we want to know how fast is A but rather if A is faster than B and by how much. A very good addition in my opinion to BenchmarkTools would be a macro to compare A vs B vs… X instead us guessing if one is faster than the others based on their statistics. This macro would also allow for internal bias reduction (reloading A and B and…, etc.) and running such macro for a long time should account as well for the whole machine/OS potential bias.

For now, going to the statistics on statistics idea we can run for a long time things like:

A() = for i in 1:1000 sin(i) end
B() = for i in 1:999 sin(i) end

using BenchmarkTools

n = 10
d = []
for i in 1:n
    println(i)
    push!(d,(@belapsed A()) - (@belapsed B()))
end

And then analyze results in ways like:

using HypothesisTests, Statistics, Plots
plot(d,label="A-B")
plot!([0], seriestype = :hline, label= "H0 μ=0")
plot!([mean(d)], seriestype = :hline, label = "μA - μB")

julia> pvalue(OneSampleTTest(mean(d),std(d),n))
0.004007649672446793

image

Also, since you’re working on this and you raised this problem, if I may I would suggest for you to open an issue to the BenchmarkTools team. I’d support the idea of a macro like @benchmark A B …

3 Likes