I agree a lot with this!
And yet, there is a place for lower precision faster benchmarks, especially in REPL mode.
Thanks for linking that, interesting. And yet, reading it decreases my trust in benchmarktools, rather than increasing it. The core model is equation (1) on page 3. To excerpt:
Let P_0 be a deterministic benchmark program which consists of an instruction tape consisting of k instructions […]
Let \tau^{[i]} be the run time of instruction I^{i}. Then, the total run time of P_0 can be written T_{P_0} = \Sigma^N_{i=1}\tau^{[i]}.
This model may have been accurate in the late 80s, but today it is so indescribably wrong that it seriously makes me doubt benchmarktools.
Look, the CPU executes lots of instructions in parallel, has a giant reorder buffer, and has incredible amounts of microarchitectural state.
Micro-benchmarking is all chiefly about controlling the microarchitectural state such that it correctly simulates the real context you want to simulate.
Look at e.g. this discussion on how the long memory of the branch predictor interacts with benchmarktools.