How to benchmark properly? Should defaults change?

lmiq · August 10, 2021, 2:44pm

What is correct here? I guess is the one with evals=1. In that case, couldn’t it be the default option?

julia> using BenchmarkTools

julia> @btime sin(5.0)
  1.538 ns (0 allocations: 0 bytes)
-0.9589242746631385

julia> x = 5.0
5.0

julia> @btime sin($x)
  7.530 ns (0 allocations: 0 bytes)
-0.9589242746631385

julia> @btime sin($x) evals=1
  40.000 ns (0 allocations: 0 bytes)
-0.9589242746631385

jling · August 10, 2021, 2:47pm

@btime gives you the minimal time, so of course eval=1 will make a difference

kristoffer.carlsson · August 10, 2021, 2:50pm

When things can be constant propagated, I usually Ref wrap the input like:

julia> using BenchmarkTools

julia> @btime sin(5.0) # bogus result from const prop
  0.013 ns (0 allocations: 0 bytes)
-0.9589242746631385

julia> x = 5.0
5.0

julia> @btime sin($x) # bogus result from const prop
  0.013 ns (0 allocations: 0 bytes)
-0.9589242746631385

julia> @btime sin($(Ref(x))[]) # ok
  6.476 ns (0 allocations: 0 bytes)
-0.9589242746631385

lmiq · August 10, 2021, 2:51pm

I don’t think is simply that. With evals=1 it is running many samples of 1 evaluation each. Without it, it is running many samples of many evaluations (at least is what I understand from the manual). My understanding is that when more than one evaluation per sample is being run, we are getting some artifact associated to caching results.

julia> @benchmark sin($x)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  7.539 ns … 42.284 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.425 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.240 ns ±  1.905 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      █ ▅                                                     
  ▁▂▆▅███▁▁▁▂▂▃▁▃▁▂▁▂▁▂▅▁▆▂▁▄▁▁▂▁▁▁▂▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  7.54 ns        Histogram: frequency by time        15.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sin($x) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  41.000 ns … 542.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.989 ns ±   5.730 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁    ▆    █                                          
  ▃▁▁▁▅▁▁▁▁█▁▁▁▁█▁▁▁▁█▁▁▁▁▇▁▁▁▁▆▁▁▁▁▇▁▁▁▁▅▁▁▁▁▅▁▁▁▁▄▁▁▁▁▄▁▁▁▁▃ ▃
  41 ns           Histogram: frequency by time           53 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@kristoffer.carlsson I think you still have constant propagation in your Ref example, but within samples. Might be wrong though. What you get if you use evals=1? (your timings are very different from mine here).

Here I get:

julia> @btime sin($(Ref(x))[]) 
  7.778 ns (0 allocations: 0 bytes)
-0.9589242746631385

julia> @btime sin($(Ref(x))[]) evals=1
  40.000 ns (0 allocations: 0 bytes)
-0.9589242746631385

kristoffer.carlsson · August 10, 2021, 2:55pm

I use nightly Julia which probably has better constant propagation.

lmiq · August 10, 2021, 3:00pm

That is good news for Julia, but not so for BenchmarkTools, which becomes harder to understand.

I still don’t understand the difference in the results. It is even strange that, for example, using evals=1 one gets:

julia> @btime sin($x) evals=1
  40.000 ns (0 allocations: 0 bytes)
-0.9589242746631385

and with evals=2 one gets:

julia> @btime sin($x) evals=2
  24.500 ns (0 allocations: 0 bytes)
-0.9589242746631385

And by increasing evals one converges to about 7 ns which is (?) the correct benchmark (?).

These are completely systematic, thus these results do not seem to be associated with random fluctuations of the benchmark.

tisztamo · August 10, 2021, 3:02pm

I checked the code a few months ago and it seemed that the runtime of a single time_ns() call is always added to the measuerement, resulting in a 25ns/evals error on my machine.

lmiq · August 10, 2021, 3:04pm

Ah, that is one reason for the problem. Ok. So that systematic error is diluted when one uses many evaluations in each sample.

That mixed with the constant propagation thing makes makes benchmarking a little bit confusing. Perhaps there is room for improvement in the API?

stillyslalom · August 10, 2021, 3:12pm

I’ve found that I get more robust results by just broadcasting over an input vector:

julia> @btime sin.(x) setup=(x=rand(1000));
  6.083 μs (1 allocation: 7.94 KiB)

It’s less ergonomic, but it does a good job guarding against overly-aggressive constant propagation.

Oscar_Smith · August 10, 2021, 8:23pm

This is also good for functions with branches. For example, sin will be faster for numbers less than pi/4.

Elrod · August 10, 2021, 9:06pm

Plus, it makes the SIMD implementations look good.

I normally Ref-wrap any isbits structs I’m benchmarking.

40ns is way too long. That’d be well over 100 clock cycles for most CPUs.

lmiq · August 10, 2021, 9:09pm

Couldn’t that be automatic?

Elrod · August 10, 2021, 9:11pm

It’s of course possible that the compiler will get smart enough to defeat this, too, eventually.

But I’d be in favor of it ref-wrapping everything by default.

The macro doesn’t have access to type information, but it could probably generate code that’s the equivalent of

rx = Ref(x)
isbits(x) ? x : rx[]

Or maybe just ref-wrap everything by default.

jzr · August 10, 2021, 9:48pm

There is some disagreement on whether minimum should be used.

Robust benchmarking in noisy environments (2016)

Minimum Times Tend to Mislead When Benchmarking (2019)

Topic		Replies	Views
BenchmarkTools New to Julia	6	2243	July 16, 2020
BenchmarkTools with simple, fast-running function New to Julia	3	2130	February 21, 2019
How to specify the number of execution and the number of repetitions per execution in BenchmarkTools? General Usage question	16	2613	September 2, 2021
BenchmarkTools setup isn't run between each iteration? General Usage question , benchmarktools	6	1507	November 17, 2021
Easy way to run benchmarks quickly Performance benchmarktools	2	400	October 27, 2021

How to benchmark properly? Should defaults change?

Related topics