Identical functions repeated benchmarks show systematic differences

I don’t think anybody is claiming BenchmarkTools is introducing any artifacts, rather that BenchmarkTools do not account for the Machine/OS artifacts when comparing benchmarks.

What the users typically may want when using BenchmarkTools is to answer these questions:

  1. Is A faster than B?
  2. By how much?
  3. How confident are we on those differences?

None of these questions are directly addressed by BenchmarkTools as of today.

judge cannot tell you if there is a real difference and can not even tell you in which direction the real difference goes.

My point though using the deepcopy/nice settings is not to prove that that’s the best way to do things -in fact I thought about a few other ways do diminish the effect of Machine/OS artifacts that I didn’t use in that example, but rather that yes we can reliably detect (or in this case not to detect if there are none) those micro differences.

Anyway, this conversation was really interesting and I have learnt a lot, I found specially useful your link to leaky abstractions thank you again @Sukera . However since this Topic has been resolved on the reasons why there are systematic differences for the same function, I think it’s best that I end my participation in this thread and perhaps I open a new Topic in the future with very specific examples of what I am trying to convey.

1 Like