I don’t think anybody is claiming BenchmarkTools
is introducing any artifacts, rather that BenchmarkTools
do not account for the Machine/OS artifacts when comparing benchmarks.
What the users typically may want when using BenchmarkTools
is to answer these questions:
- Is A faster than B?
- By how much?
- How confident are we on those differences?
None of these questions are directly addressed by BenchmarkTools as of today.
judge
cannot tell you if there is a real difference and can not even tell you in which direction the real difference goes.
My point though using the deepcopy/nice
settings is not to prove that that’s the best way to do things -in fact I thought about a few other ways do diminish the effect of Machine/OS artifacts that I didn’t use in that example, but rather that yes we can reliably detect (or in this case not to detect if there are none) those micro differences.
Anyway, this conversation was really interesting and I have learnt a lot, I found specially useful your link to leaky abstractions
thank you again @Sukera . However since this Topic has been resolved on the reasons why there are systematic differences for the same function, I think it’s best that I end my participation in this thread and perhaps I open a new Topic in the future with very specific examples of what I am trying to convey.