Interaction Between Caching and Benchmarking

foobar_lv2 · December 18, 2024, 2:50pm

Yes, BenchmarkTools will run the same function multiple times. Hence it will tell you “if I run the same function many times, on identical data, what will be the throughput”.

Previous runs will affect microarchitectural state – caches, branch-predictor, etc will still be hot.

So what you should do is to always rescale the resultant timings by N, and then measure (plot) for multiple different values of N.

Furthermore, you should always have a simplified reference thing – in your example, something like simplifiedModel(C,A,B) = Threads.@threads for n=1:length(C) @inbounds C[n] = A[n] + B[n] end (read two large arrays, do a trivial computation, write it to output array).

For output, you should not just print “Memory bound”, but should also print the raw data going in there, like e.g. “ok, 32MB arrays”, as well as the cache size of your CPU (e.g. lscpu).

You don’t need to change BLAS threading – you’re doing 2x2 matmuls on StaticArrays, that should be completely inlined, no BLAS involved. And it is obvious that the arithmetic density is pitiful – that is, you should tweak your code until it benchmarks similar to simplifiedModel for large arrays, i.e. you should definitely saturate main memory bandwidth (should you saturate L3 bandwidth? Good question. You can see that by plotting normalized runtime against N and seeing whether some of the L1/L2/L3/main-memory plateaus vanish!).

This is, in general, a very common misconception about benchmarking: People assume a simplified model where each operation/function takes a certain amount of time, and the amount of time taken for a bunch of ops is the sum of the times taken for each one. Lol nope, “time taken” is not even approximatively additive for smallish times.

Topic		Replies	Views
Performance detoriation with the increase of array size General Usage	13	445	November 22, 2024
Is my understanding of a data reading benchmark correct? Data	4	603	April 28, 2019
Threads.@threads slow down when arrays become too large New to Julia	1	922	November 29, 2021
Repeated @benchmark causes RAM creep Performance	4	602	November 26, 2019
Blas dot doing weird things depending on size Performance question	3	581	October 29, 2019

Interaction Between Caching and Benchmarking

Related topics