Making @benchmark outputs statistically meaningful, and actionable

AFAIK PGO for Julia user code is a relevant topic. Were Julia to gain profile-guided recompilation as an option for Julia packages, as mentioned here, BenchmarkTools could presumably stay simple, because LLVM would take care of the complexity. That is, BenchmarkTools would just need to recompile with PGO, which should (when enough PGO-based optimizations are implemented) ensure that some of the issues mentioned here don’t matter.

No, nice doesn’t cut it. To prevent interruption you’d need to run Julia with a real time priority, perhaps using chrt. Also you’d probably need to fiddle with kernel options for real time scheduling (by default Linux reserves some time for non-realtime processes, so you’d need to turn that off), and possibly with other scheduler options. See this, for a start: Real-Time group scheduling — The Linux Kernel documentation

You might also want to fiddle with kernel and/or CPU options that control the powersaving/performance trade offs and similar.

You might also want to reserve some cores for Julia.

If you go down this route, make sure not to damage your system. EDIT: if you forbid the kernel from preempting your process, I think you need to make that process yield to the kernel on its own volition, by doing explicit sleeps every so often. Not sure if this is doable without modifying BenchmarkTools, and maybe even the Julia runtime.

There are tons of other things that can affect performance. For example, plugging in a laptop’s charger, so that it’d have more power available, usually improves performance compared to running merely off battery power.

1 Like

I’m aware of this one. All my tests were done with my laptop plugged in.

A couple of updates to this thread:

  1. Somebody replied to my Stabilizer issue saying

“there are problems with stabilizer that I don’t know how to solve with reasonable effort. Check other open issues before proceeding further ahead.”

This suggests that Stabilizer may not be a good solution to randomizing memory layout.

  1. Prof. Berger replied to my email, and shared this document on design of a benchmark suite for evaulating a couple of web stack libraries. It’s not particularly relevant or helpful for us, except that it lists the following requirements for mitigating measurement bias caused by memory layout:
    a) Shuffling allocator
    b) Stack padding
    c) Shuffling linker
    d) Randomizing environment variables

There were also some links to some helpful academic papers at the bottom of the page.

I don’t know enough about Julia’s compilation process to know where and how these randomizations could be implemented.

2 Likes