It’s already been out there for a while but I nonetheless wanted to properly announce ThreadPinning.jl here.
What is ThreadPinning.jl about?
In short, it’s about pinning Julia threads to specific CPU-cores. To that end, it provides
- a convenient visualisation of the system topology and the current thread-core mapping, and
- tools to fully control and specify the latter.
First impression / demonstration
Dual-socket system where each CPU has 40 hardware threads (20 CPU-cores with 2-way SMT).
Besides using the
pinthreads function directly, which is the most powerful interface, you can also specify the thread-core mapping via environment variables or Julia preferences. See this section of the docs for more information.
Why care about thread pinning?
When you run a multithreaded Julia computation, by default, you’re leaving it to the operating system to distribute your Julia threads among the cores of your CPU(s). While the OS tries to smartly position your threads on the system, it might, for various reasons, make suboptimal choices. For example, it might place multiple Julia threads on the same core or move Julia threads around during the computation. This can dramatically reduce the performance of your computation, especially on more complicated systems such as HPC cluster nodes with multiple CPUs (typically two) and multiple memory domains (NUMA) per CPU.
(Even if you pin your threads, there are, of course, many different thread-core mappings that one can imagine, each of which can lead to very different performance (see here for a simple example).)
2) Hardware performance monitoring
A number of low-level tools operate on a CPU-core level and, to use them, you need to know or need to specify on which CPU-core your Julia thread is running. One example is LIKWID.jl which allows you to measure the performance of Julia code on a hardware level by monitoring hardware counters inside of CPU-cores (see my JuliaCon 2022 talk about this). In fact, LIKWID.jl’s
@perform macro uses ThreadPinning.jl under the hood to make sure that the Julia thread actually runs on the monitored CPU-core (and also stays there).
3) Reducing fluctuations in benchmarks
By implementing a particular Julia thread-core mapping, you reduce fluctuations in performance benchmarks which would otherwise originate from the OS deciding to place Julia threads on different cores between benchmark runs.
(Extra: System analysis: Core-to-core latency measurements)
Limitation: Currently Only Linux
ThreadPinning.jl currently only supports Linux. See here for why this is and how you can help to add Windows support.
Hope you find the package useful. Looking forward to questions, issues, and/or PRs
All the best,