[ANN] BenchmarkHistograms.jl

ericphanson · May 22, 2021, 5:57pm

BenchmarkHistograms.jl is a simple package to provide a UnicodePlots.jl-powered show method for BenchmarkTools’ @benchmark. This idea was originally discussed in BenchmarkTools.jl#180.

BenchmarkHistograms works by exporting its own @benchmark macro which outputs a BenchmarkHistogram object holding the results (which in turn are obtained from BenchmarkTools.@benchmark), and it is for this object that a custom show method is defined. In other words, BenchmarkHistograms does not commit piracy and simply provides an alternative @benchmark.

For ease of use, BenchmarkHistograms re-exports all the other exports of BenchmarkTools. So one can simply call using BenchmarkHistograms instead of using BenchmarkTools, and the only difference will be that @benchmark’s show method will plot a histogram in addition to displaying the usual values.

Example

The following is a simple example from the README which is designed to show that for some benchmarks simply looking at summary statistics can miss out on important information.

julia> @benchmark 5 ∈ v setup=(v = sort(rand(1:10_000, 10_000)))

samples: 3192; evals/sample: 1000; memory estimate: 0 bytes; allocs estimate: 0
                       ┌                                        ┐ 
      [   0.0,  500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2036   
      [ 500.0, 1000.0) ┤ 0                                        
      [1000.0, 1500.0) ┤ 0                                        
   ns [1500.0, 2000.0) ┤ 0                                        
      [2000.0, 2500.0) ┤ 0                                        
      [2500.0, 3000.0) ┤ 0                                        
      [3000.0, 3500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1156                  
                       └                                        ┘ 
                                        Counts
min: 1.875 ns (0.00% GC); mean: 1.141 μs (0.00% GC); median: 4.521 ns (0.00% GC); max: 3.315 μs (0.00% GC).

Here, we see a bimodal distribution; in the case 5 is indeed in the vector, we find it very quickly, in the 0-500 ns range (thanks to sort which places it at the front). In the case 5 is not present, we need to check every entry to be sure, and we end up in the 3000-3500 ns range.

Happy benchmarking!

xiaodai · May 23, 2021, 12:17am

very cool!

Elrod · May 23, 2021, 5:34am

julia> @benchmark bevalpoly!($y, $x)
samples: 10000; evals/sample: 200; memory estimate: 0 bytes; allocs estimate: 0
                     ┌                                        ┐
      [400.0, 410.0) ┤▇▇▇▇▇▇▇▇▇▇▇ 2193
      [410.0, 420.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 6693
      [420.0, 430.0) ┤▇▇▇▇ 826
      [430.0, 440.0) ┤ 91
      [440.0, 450.0) ┤ 75
   ns [450.0, 460.0) ┤ 55
      [460.0, 470.0) ┤ 33
      [470.0, 480.0) ┤ 27
      [480.0, 490.0) ┤ 5
      [490.0, 500.0) ┤ 1
      [500.0, 510.0) ┤ 0
      [510.0, 520.0) ┤ 1
                     └                                        ┘
                                      Counts
min: 400.945 ns (0.00% GC); mean: 416.129 ns (0.00% GC); median: 419.205 ns (0.00% GC); max: 515.420 ns (0.00% GC).

julia> @benchmark evalpoly!($y, $x)
samples: 10000; evals/sample: 200; memory estimate: 0 bytes; allocs estimate: 0
                     ┌                                        ┐
      [400.0, 450.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 9873
      [450.0, 500.0) ┤ 116
      [500.0, 550.0) ┤ 2
   ns [550.0, 600.0) ┤ 8
      [600.0, 650.0) ┤ 0
      [650.0, 700.0) ┤ 0
      [700.0, 750.0) ┤ 0
      [750.0, 800.0) ┤ 1
                     └                                        ┘
                                      Counts
min: 400.950 ns (0.00% GC); mean: 416.991 ns (0.00% GC); median: 419.235 ns (0.00% GC); max: 777.840 ns (0.00% GC).

Based on min, mean, and median times, these both performed almost exactly the same.
But the second has a lot of samples in the 550-600 range, and a much higher maximum, placing their histograms on different scales, making visual comparisons difficult.

If we have a huge number of samples (e.g., 10_000), it may be nice to trim off some of the extremes. That’d also help us see more of a breakdown in the 400-450 range in the above example.

ericphanson · May 23, 2021, 11:26am

Good point, I’ve noticed this issue too. Any ideas for what the methodology should be for selecting outliers? My first guess would be the gap between the median and minimum would be a decent reference for the spread (based partly on https://juliaci.github.io/BenchmarkTools.jl/dev/manual/#Which-estimator-should-I-use?), and any if there are a small number of points that exceed the median plus some multiple of that gap, then maybe they are outliers. Maybe there is something more principled though.

For making a comparison between two benchmarks, I think there is probably something better we could do than just removing outliers, like plot the two distributions side-by-side with the same bins, or even on the same plot. I’m not sure what the right API for that would be; BenchmarkTools uses judge for comparing estimators (min/median/etc) and outputs invariant or improvement or regression along with the % change. We could add a method for judge for BenchmarkHistogram objects but I’m not sure we’d want to actually make a judgement as much as show a comparison plot so maybe that’s not the right function to use.

ericphanson · May 25, 2021, 11:12am

BenchmarkHistograms v0.2 is out and now collects possible “outliers” into the last bin. E.g. using the example

f() = sum((sin(i) for i in 1:round(Int, 1000 + 100*randn())))

@benchmark f()

we get

samples: 10000; evals/sample: 4; memory estimate: 0 bytes; allocs estimate: 0
ns

 (7290.0  - 7570.0 ]  ▎18
 (7570.0  - 7850.0 ]  █▎94
 (7850.0  - 8140.0 ]  ████▌340
 (8140.0  - 8420.0 ]  ██████████████▎1091
 (8420.0  - 8700.0 ]  ████████████████████████▊1902
 (8700.0  - 8980.0 ]  ██████████████████████████████2316
 (8980.0  - 9260.0 ]  █████████████████████████▌1965
 (9260.0  - 9540.0 ]  ████████████████▋1282
 (9540.0  - 9820.0 ]  ███████▋580
 (9820.0  - 10100.0]  ███▌262
 (10100.0 - 10390.0]  █72
 (10390.0 - 10670.0]  ▌37
 (10670.0 - 10950.0]  ▍22
 (10950.0 - 11230.0]  ▏9
 (11230.0 - 29540.0]  ▎10

                  Counts

min: 7.292 μs (0.00% GC); mean: 8.923 μs (0.00% GC); median: 8.886 μs (0.00% GC); max: 29.541 μs (0.00% GC).

You can see the last bin is much larger than the others, spanning from 11230.0 to 29540.0. Before (using the exact same benchmark times, just replotting it), we would have had

samples: 10000; evals/sample: 4; memory estimate: 0 bytes; allocs estimate: 0
                         ┌                                        ┐ 
      [ 6000.0,  8000.0) ┤▇ 218                                     
      [ 8000.0, 10000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 9552   
      [10000.0, 12000.0) ┤▇ 223                                     
      [12000.0, 14000.0) ┤ 2                                        
      [14000.0, 16000.0) ┤ 3                                        
   ns [16000.0, 18000.0) ┤ 0                                        
      [18000.0, 20000.0) ┤ 1                                        
      [20000.0, 22000.0) ┤ 0                                        
      [22000.0, 24000.0) ┤ 0                                        
      [24000.0, 26000.0) ┤ 0                                        
      [26000.0, 28000.0) ┤ 0                                        
      [28000.0, 30000.0) ┤ 1                                        
                         └                                        ┘ 
                                          Counts
min: 7.292 μs (0.00% GC); mean: 8.923 μs (0.00% GC); median: 8.886 μs (0.00% GC); max: 29.541 μs (0.00% GC).

You can see the huge outlier at 30k ns force us to have many empty bins, and the bulk of the Gaussian distribution only has 3 bins.

There is a setting that controls this, BenchmarkHistograms.OUTLIER_QUANTILE, which defaults to 0.999. How it works is if the 99.9 percentile is more than 2 bins away from the true maximum, then we will consider them outliers and squash all those bins into a single bin. E.g. in this case, the 99.9 percentile is 11229.33.

We also dropped the UnicodePlots dependency in favor of some custom histogram code written by @brenhinkeller in https://github.com/JuliaCI/BenchmarkTools.jl/pull/180#issuecomment-711128281, which made it easier to customize this special binning behavior.

giancarloantonucci · May 27, 2021, 6:20pm

Does this package work with all versions of Julia from 1.0 to 1.6?

ericphanson · May 27, 2021, 6:28pm

Yep!

ericphanson · June 28, 2021, 12:57pm

BenchmarkTools v1.1 provides fancy histogram printing by default using @benchmark (thanks to BenchmarkTools#217 by @tecosaur), so BenchmarkHistograms is not needed anymore and will probably not get any more development or releases. Thanks for the interest and Pkg.update("BenchmarkTools")!

julia> @benchmark 5 ∈ v setup=(v = sort(rand(1:10_000, 10_000)))
BechmarkTools.Trial: 3125 samples with 1000 evaluations.
 Range (min … max):  2.500 ns … 6.578 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.333 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.160 μs ± 1.528 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                    ▇ ▂
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▇ █
  2.5 ns      Histogram: log(frequency) by time     3.36 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Edit: I realised some people credit me with helping with the upstream BenchmarkTools functionality— that was actually an entirely unrelated effort! I think it looks great though and am happy to retire BenchmarkHistograms (but I can’t claim any credit for it).

Topic		Replies	Views
BenchmarkTools @benchmark beautiful histogram reuse General Usage plotting , unicodeplots , histogram	12	1445	March 7, 2022
Benchmark Tests: Improvements for BenchmarkTools Performance discussion	16	2185	August 12, 2021
BenchmarkTools.jl and Juno New to Julia question	14	2133	January 3, 2020
[ANN] AirspeedVelocity.jl - easily benchmark Julia packages over their lifetime Package Announcements package , announcement , benchmark	7	1315	June 19, 2024
Identical functions repeated benchmarks show systematic differences Performance question , sort	37	2833	August 2, 2021

[ANN] BenchmarkHistograms.jl

Example

Related topics