BenchmarkTools with simple, fast-running function

#1

I thought that BenchmarkTools was designed to be able to test simple, fast-running functions by evaluating them multiple times to avoid issues with the timing precision.

Using the example from the README, it doesn’t appear to me that it’s actually executing the function for each evaluation. Am I misunderstanding something?

julia> @benchmark sin(1) evals=1
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     18.000 ns (0.00% GC)
  median time:      20.000 ns (0.00% GC)
  mean time:        19.785 ns (0.00% GC)
  maximum time:     39.000 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark sin(1) evals=1000
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.017 ns (0.00% GC)
  median time:      0.020 ns (0.00% GC)
  mean time:        0.020 ns (0.00% GC)
  maximum time:     0.035 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark sin(1) evals=1000000
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.001 ns (0.00% GC)
  median time:      0.001 ns (0.00% GC)
  mean time:        0.001 ns (0.00% GC)
  maximum time:     0.001 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000000



#2

Benchmarking is often tricky to get right for tiny functions, since Julia is so good at optimizing your code. Times of ~0.01 ns are the result of the compiler replacing your expression with a constant, and you end up benchmarking nothing at all. (1 nanosecond is ~3 CPU clock cycles on a 3 GHz computer, so 0.01 ns is not enough to do anything.)

To get around that, I usually try to structure the benchmark expression in such a way that the compiler can’t constant-fold it or cheat in any other way. In the example below, I also chose a vector large enough that the CPU can’t learn the branching behavior.

julia> v = rand(100_000);

julia> a = similar(v);

julia> @btime $a .= sin.($v);
  626.487 μs (0 allocations: 0 bytes)

100k calls in 626 μs equals around 6.26 ns per call to sin (if called through broadcast).

And yes, the example in the BenchmarkTools manual is now broken, and it would be great if they could talk more about these difficulties. Cf. this issue:

8 Likes
#3

Thanks. I had suspected that but thought that because it was being used as an example, BenchmarkTools had some way to disable the optimization. For example, Rust has black_box that forces evaluation.

I’ve created a issue to hopefully avoid future confusion:

1 Like
#4

I think this PR was meant to address that (disabling optimizations), but unfortunately it was never completed.