[ANN] Announcing ThreadPinning.jl

Hi all,

It’s already been out there for a while but I nonetheless wanted to properly announce ThreadPinning.jl here. :slight_smile:

What is ThreadPinning.jl about?

In short, it’s about pinning Julia threads to specific CPU-cores. To that end, it provides

  1. a convenient visualisation of the system topology and the current thread-core mapping, and
  2. tools to fully control and specify the latter.

First impression / demonstration

Dual-socket system where each CPU has 40 hardware threads (20 CPU-cores with 2-way SMT).

Besides using the pinthreads function directly, which is the most powerful interface, you can also specify the thread-core mapping via environment variables or Julia preferences. See this section of the docs for more information.

Why care about thread pinning?

1) Performance

When you run a multithreaded Julia computation, by default, you’re leaving it to the operating system to distribute your Julia threads among the cores of your CPU(s). While the OS tries to smartly position your threads on the system, it might, for various reasons, make suboptimal choices. For example, it might place multiple Julia threads on the same core or move Julia threads around during the computation. This can dramatically reduce the performance of your computation, especially on more complicated systems such as HPC cluster nodes with multiple CPUs (typically two) and multiple memory domains (NUMA) per CPU.

(Even if you pin your threads, there are, of course, many different thread-core mappings that one can imagine, each of which can lead to very different performance (see here for a simple example).)

2) Hardware performance monitoring

A number of low-level tools operate on a CPU-core level and, to use them, you need to know or need to specify on which CPU-core your Julia thread is running. One example is LIKWID.jl which allows you to measure the performance of Julia code on a hardware level by monitoring hardware counters inside of CPU-cores (see my JuliaCon 2022 talk about this). In fact, LIKWID.jl’s @perform macro uses ThreadPinning.jl under the hood to make sure that the Julia thread actually runs on the monitored CPU-core (and also stays there).

3) Reducing fluctuations in benchmarks

By implementing a particular Julia thread-core mapping, you reduce fluctuations in performance benchmarks which would otherwise originate from the OS deciding to place Julia threads on different cores between benchmark runs.

(Extra: System analysis: Core-to-core latency measurements)

Limitation: Currently Only Linux

ThreadPinning.jl currently only supports Linux. See here for why this is and how you can help to add Windows support.

Hope you find the package useful. Looking forward to questions, issues, and/or PRs :slight_smile:

All the best,
Carsten

44 Likes

Would this work for me if I run it through WSL?

Just so I can get that kind of overview.

Kind regards

1 Like

Good question. I never explicitly considered WSL and I would be surprised if it worked out of the box. OTOH, for threadinfo to work, essentially only lscpu is required (and Sys.islinux() == true since I currently explicitly throw an error for others OSs). Maybe just try it? If it doesn’t work, it would be interesting to see what’s missing, so please file an issue.

1 Like

Great work.

Is something similar being introduced in Julia1.9 ?

Thanks.

Not sure what you mean, but no. This is fine in a package (which can evolve much faster).

I was mistaken, talking about this presentation that Jeff did about interactive thread pools at 18:13.

Just chiming in to say another big use case is for high-availability requests in production workloads! We currently have a Julia backend microservices stack which requires a monitoring heartbeat to keep the service alive every x seconds.

Before threadpinning, when a service was cpu bound, it couldn’t reply to the cluster keep-alive heartbeat, resulting in the service being killed when it was legitimately busy, and not actually dead. So now we keep at least 1 CPU free to reply to the heartbeats / supply the monitoring metrics etc.

Big props for this package!

4 Likes

What I forgot to mention in the OP: ThreadPinning.jl also has beta-level support for pinning OpenBLAS threads.

See BLAS/LAPACK · ThreadPinning.jl for the function references.

Note, though, that querying the affinity of OpenBLAS threads (i.e. openblas_getcpuids and openblas_print_affinity_masks in the example above) requires Julia >= 1.9 because the underlying API on the OpenBLAS side has only been added in OpenBLAS v0.3.21. Pinning the threads “blindly” should also work on Julia >= 1.6.

3 Likes

hi, thanks for your work on this, sounds awesome.

say, I have Linux running on Docker on a Mac. do you think this package will have the same impact as running on a native Linux distribution?

thanks

No, unlikely.

For those who are interested, here is my JuliaCon 2023 talk about ThreadPinning.jl:

4 Likes

Since the docs mention that they help to make benchmarking more consistent, could you say a few words how can that be compared/combined to cpuset proposed in Linux-based environments · BenchmarkTools.jl

2 Likes

ThreadPinning.jl doesn’t protect the CPU-threads from the scheduler (i.e. other processes). It only ensures that the Julia threads run where you want them run and don’t migrate away. I would therefore say that the cset shield is complementary.

I personally don’t need a shield because I’m working on HPC compute nodes which are essentially empty (i.e. there are only few other system processes).

1 Like