[ANN] Announcing ThreadPinning.jl

carstenbauer · December 28, 2022, 1:41pm

Hi all,

It’s already been out there for a while but I nonetheless wanted to properly announce ThreadPinning.jl here.

What is ThreadPinning.jl about?

In short, it’s about pinning Julia threads to specific CPU-cores. To that end, it provides

a convenient visualisation of the system topology and the current thread-core mapping, and
tools to fully control and specify the latter.

First impression / demonstration

Dual-socket system where each CPU has 40 hardware threads (20 CPU-cores with 2-way SMT).

Besides using the pinthreads function directly, which is the most powerful interface, you can also specify the thread-core mapping via environment variables or Julia preferences. See this section of the docs for more information.

Why care about thread pinning?

1) Performance

When you run a multithreaded Julia computation, by default, you’re leaving it to the operating system to distribute your Julia threads among the cores of your CPU(s). While the OS tries to smartly position your threads on the system, it might, for various reasons, make suboptimal choices. For example, it might place multiple Julia threads on the same core or move Julia threads around during the computation. This can dramatically reduce the performance of your computation, especially on more complicated systems such as HPC cluster nodes with multiple CPUs (typically two) and multiple memory domains (NUMA) per CPU.

(Even if you pin your threads, there are, of course, many different thread-core mappings that one can imagine, each of which can lead to very different performance (see here for a simple example).)

2) Hardware performance monitoring

A number of low-level tools operate on a CPU-core level and, to use them, you need to know or need to specify on which CPU-core your Julia thread is running. One example is LIKWID.jl which allows you to measure the performance of Julia code on a hardware level by monitoring hardware counters inside of CPU-cores (see my JuliaCon 2022 talk about this). In fact, LIKWID.jl’s @perform macro uses ThreadPinning.jl under the hood to make sure that the Julia thread actually runs on the monitored CPU-core (and also stays there).

3) Reducing fluctuations in benchmarks

By implementing a particular Julia thread-core mapping, you reduce fluctuations in performance benchmarks which would otherwise originate from the OS deciding to place Julia threads on different cores between benchmark runs.

(Extra: System analysis: Core-to-core latency measurements)

Limitation: Currently Only Linux

ThreadPinning.jl currently only supports Linux. See here for why this is and how you can help to add Windows support.

Hope you find the package useful. Looking forward to questions, issues, and/or PRs

All the best,
Carsten

Ahmed_Salih · December 28, 2022, 1:56pm

Would this work for me if I run it through WSL?

Just so I can get that kind of overview.

Kind regards

carstenbauer · December 28, 2022, 2:08pm

Good question. I never explicitly considered WSL and I would be surprised if it worked out of the box. OTOH, for threadinfo to work, essentially only lscpu is required (and Sys.islinux() == true since I currently explicitly throw an error for others OSs). Maybe just try it? If it doesn’t work, it would be interesting to see what’s missing, so please file an issue.

AMJ · December 28, 2022, 2:47pm

Great work.

Is something similar being introduced in Julia1.9 ?

carstenbauer · December 28, 2022, 3:36pm

Thanks.

Not sure what you mean, but no. This is fine in a package (which can evolve much faster).

AMJ · December 29, 2022, 7:17am

I was mistaken, talking about this presentation that Jeff did about interactive thread pools at 18:13.

spcogg · December 30, 2022, 6:14am

Just chiming in to say another big use case is for high-availability requests in production workloads! We currently have a Julia backend microservices stack which requires a monitoring heartbeat to keep the service alive every x seconds.

Before threadpinning, when a service was cpu bound, it couldn’t reply to the cluster keep-alive heartbeat, resulting in the service being killed when it was legitimately busy, and not actually dead. So now we keep at least 1 CPU free to reply to the heartbeats / supply the monitoring metrics etc.

Big props for this package!

carstenbauer · December 30, 2022, 8:55am

What I forgot to mention in the OP: ThreadPinning.jl also has beta-level support for pinning OpenBLAS threads.

See BLAS/LAPACK · ThreadPinning.jl for the function references.

Note, though, that querying the affinity of OpenBLAS threads (i.e. openblas_getcpuids and openblas_print_affinity_masks in the example above) requires Julia >= 1.9 because the underlying API on the OpenBLAS side has only been added in OpenBLAS v0.3.21. Pinning the threads “blindly” should also work on Julia >= 1.6.

roh_codeur · September 17, 2023, 2:32pm

hi, thanks for your work on this, sounds awesome.

say, I have Linux running on Docker on a Mac. do you think this package will have the same impact as running on a native Linux distribution?

thanks

carstenbauer · September 17, 2023, 10:00pm

No, unlikely.

carstenbauer · September 17, 2023, 10:01pm

For those who are interested, here is my JuliaCon 2023 talk about ThreadPinning.jl:

filchristou · November 13, 2023, 5:49pm

Since the docs mention that they help to make benchmarking more consistent, could you say a few words how can that be compared/combined to cpuset proposed in Linux-based environments · BenchmarkTools.jl

carstenbauer · November 13, 2023, 6:19pm

ThreadPinning.jl doesn’t protect the CPU-threads from the scheduler (i.e. other processes). It only ensures that the Julia threads run where you want them run and don’t migrate away. I would therefore say that the cset shield is complementary.

I personally don’t need a shield because I’m working on HPC compute nodes which are essentially empty (i.e. there are only few other system processes).

carstenbauer · August 8, 2024, 8:50am

Release of ThreadPinning.jl 1.0

Hey everyone,

I’ve released 1.0 yesterday .

What’s new?

Pinning and visualizing OpenBLAS threads (has been very muched improved and is part of the official API now)
Pinning (Julia threads) of MPI ranks
Pinning (Julia threads) of Julia workers (Distributed.jl)
Pinning according to an external affinity mask
Experimental support for pinning GC threads (Julia >= 1.11)
threadinfo() now works on Windows and macOS (but most other things still won’t work)
and much more…

Behind the scenes, there are two new backends: SysInfo.jl (based on Hwloc.jl and lscpu) and ThreadPinningCore.jl. The requirement of lscpu being available has been dropped.

For more information, check out the improved documentation.

Want to transition from pre 1.0?

There are some minor API changes and some (unpopular) features have been dropped. Please consult the API documentation and → check out the CHANGELOG.md.

Want to help improve the stability of the package? (< 5 min)

If you have a system (e.g. HPC compute node) that has a particularly interesting/strange hardware topology, please consider adding the system as a “fake testsystem” to SysInfo.jl by following the (very short!) instructions here.

Topic		Replies	Views
Thread affinitization: pinning Julia threads to cores General Usage multithreading	10	3776	January 27, 2022
Optimize for physical cores and caches Performance multithreading	4	657	August 7, 2023
Julia SLURM + BLAS + Multithreading, threads not mapping well leading to poor performance Performance multithreading , mpi , slurm	5	187	June 25, 2025
Julia Thread Affinity not persistent when calling MKL function Performance performance , multithreading , threads	2	2136	January 14, 2022
The purpose of ThreadingUtilities.jl and Polyester.jl Julia at Scale	9	1204	April 10, 2022