WTF AMD. I honestly cannot decide whether to be impressed (want to buy a Ryzen now) or disgusted (distrust every published Ryzen benchmark). Both, I guess.
We might need to fix this in BenchmarkTools
, at least partially. The fix there would be to use the TSC register for timing, ask users to use setup
and then fix evals=1
if a setup is provided. But then we may need to look at the way setup
and interpolation are used.
Can you run this once larger? We are now beyond testvector sizes that we can ask users to provide, but I just want to know the limit on the period of patterns Ryzen is capable of learning. It would be very interesting to know what kind of patterns Ryzen is good at, but unfortunately I couldn’t find a detailed account of its branch prediction algorithm.
julia> begin
for i=1:2
N = 480_000
list = fill(false, N); list[1:2:end].=true;
bt0 = @belapsed findall($list)
list .= rand(Bool, N)
btL = @belapsed findall($list)
time_to_cycle = 10^9/N * CPU_SPEED_GHZ
penalty = 2*(btL-bt0)*time_to_cycle
@printf("\n\n%s; branch-miss penalty: %4.1f ns = %4.1f cycles\n\n",
cpu_model, penalty/CPU_SPEED_GHZ , penalty)
bt = bt0
@printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n",
2, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
for n=[100, 500, 1000, 2000, 2500, 3000, 5000, 10_000, 15_000, 20_000, 30_000, 40_000, 60_000, 80_000, 120_000, 240_000, 480_000]
pat = rand(Bool, n)
for i=1:n:N list[i:(i+n-1)].=pat end
bt = @belapsed findall($list) #run thrice to give BPU more time to learn pattern
bt = @belapsed findall($list)
bt = @belapsed findall($list)
@printf("Period %5d: %7.2f us = %7.2f cycles per idx. Miss-rate %5.2f%%\n",
n, bt*10^6, bt*time_to_cycle, 100*(bt - bt0) *time_to_cycle / penalty )
end
end
end;