PSA: Microbenchmarks remember branch history

foobar_lv2 · November 13, 2018, 12:55pm

WTF AMD. I honestly cannot decide whether to be impressed (want to buy a Ryzen now) or disgusted (distrust every published Ryzen benchmark). Both, I guess.

We might need to fix this in BenchmarkTools, at least partially. The fix there would be to use the TSC register for timing, ask users to use setup and then fix evals=1 if a setup is provided. But then we may need to look at the way setup and interpolation are used.

Can you run this once larger? We are now beyond testvector sizes that we can ask users to provide, but I just want to know the limit on the period of patterns Ryzen is capable of learning. It would be very interesting to know what kind of patterns Ryzen is good at, but unfortunately I couldn’t find a detailed account of its branch prediction algorithm.

julia> begin 
           for i=1:2
           N = 480_000
           list = fill(false, N); list[1:2:end].=true;
           bt0 = @belapsed findall($list)
           list .= rand(Bool, N)
           btL = @belapsed findall($list)
           time_to_cycle = 10^9/N * CPU_SPEED_GHZ
           penalty = 2*(btL-bt0)*time_to_cycle
           @printf("\n\n%s; branch-miss penalty: %4.1f ns = %4.1f cycles\n\n", 
           cpu_model, penalty/CPU_SPEED_GHZ , penalty)

           bt = bt0
           @printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n", 
                   2, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
           for n=[100, 500, 1000, 2000, 2500, 3000, 5000, 10_000, 15_000, 20_000, 30_000, 40_000, 60_000, 80_000, 120_000, 240_000, 480_000]
               pat = rand(Bool, n)
               for i=1:n:N list[i:(i+n-1)].=pat end
               bt = @belapsed findall($list) #run thrice to give BPU more time to learn pattern
               bt = @belapsed findall($list)
               bt = @belapsed findall($list)
               @printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n", 
                   n, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
           end
           end           
           end;

Topic		Replies	Views
How I learned to stop worrying about being fastest and love microbenchmarks Community	22	3008	November 3, 2023
Interesting micro-benchmark General Usage	8	699	May 24, 2019
Benchmarking with @time @btime and subsequent runs return shorter execution time New to Julia benchmark , benchmarktools	8	2136	December 31, 2020
Save @btime output New to Julia question , benchmarktools	7	3433	April 1, 2023
Wrong results when I use @btime General Usage question , package , benchmarktools	5	639	November 11, 2021

PSA: Microbenchmarks remember branch history

Related topics