I understand that BenchmarkTools’ job is a hard one, but is it expected that I frequently see several regressions while rerunning the same benchmark (without code change)? Is there some theory that (under certain assumptions) predict how often I should see a false-positive?
The standard benchmarking procedure is, AFAICT:
bench = @benchmarkable myfun() tune!(bench) first_run = run(bench) # Change code second_run = run(bench) judge(second_run, first_run)
For reasons I won’t explain here (serialization), I have to do it like this in my package:
first_run = @benchmark myfun() # Change code bench = @benchmarkable myfun() loadparams!(bench, params(first_run)) second_run = run(bench) judge(second_run, first_run)
Is that equivalent? Furthermore, shouldn’t there be a
warmup(bench) call before
second_run? It seems like a reasonable thing to do, to get rid of the JIT from the code change, but the source doesn’t seem to do that.