BenchmarkTools theory and practice

cstjean · September 5, 2017, 2:23pm

I understand that BenchmarkTools’ job is a hard one, but is it expected that I frequently see several regressions while rerunning the same benchmark (without code change)? Is there some theory that (under certain assumptions) predict how often I should see a false-positive?

The standard benchmarking procedure is, AFAICT:

bench = @benchmarkable myfun()
tune!(bench)
first_run = run(bench)
# Change code
second_run = run(bench)
judge(second_run, first_run)

For reasons I won’t explain here (serialization), I have to do it like this in my package:

first_run = @benchmark myfun()
# Change code
bench = @benchmarkable myfun()
loadparams!(bench, params(first_run))
second_run = run(bench)
judge(second_run, first_run)

Is that equivalent? Furthermore, shouldn’t there be a warmup(bench) call before second_run? It seems like a reasonable thing to do, to get rid of the JIT from the code change, but the source doesn’t seem to do that.

cc. @jrevels

jrevels · September 5, 2017, 3:29pm

Is that equivalent?

Yes, they should be (minus JIT overhead for the second run).

For reference, a lot of the theory behind BenchmarkTools is described in this paper.

Furthermore, shouldn’t there be a warmup(bench) call before second_run

Yes. It’s up to the user to add this, though. We should probably add something in the docs that says “if you don’t run the tuning process on a benchmark, and you care about warming up the benchmark to get rid of JIT overhead, you should manually call warmup before running the benchmark.”

It seems like a reasonable thing to do, to get rid of the JIT from the code change, but the source doesn’t seem to do that.

This was a purposeful decision.

For “power users” (e.g. folks that use @benchmarkable instead @benchmark), BenchmarkTools should always respect user settings instead of making decisions for the user. This is because BenchmarkTools doesn’t have enough knowledge to decide in the general case whether it’s “correct” to execute the benchmark kernel “a hidden extra time” for the sake of getting rid of JIT overhead (e.g. the kernel could be non-idempotent or have side effects).

In the more naive use case (e.g. @btime and @benchmark), the tuning process takes care of JIT overhead for the users.

Topic		Replies	Views
BenchmarkTools with simple, fast-running function New to Julia	3	2152	February 21, 2019
Easy way to run benchmarks quickly Performance benchmarktools	2	411	October 27, 2021
Is my understanding of a data reading benchmark correct? Data	4	585	April 28, 2019
Benchmark Tests: Improvements for BenchmarkTools Performance discussion	16	2264	August 12, 2021
Defining : name 'BenchmarkToools' - how? Optimization (Mathematical)	13	915	November 2, 2020

BenchmarkTools theory and practice

Related topics