Thread affinitization: pinning Julia threads to cores

Alright, I’ve done some testing and had some discussion on Slack/GitHub. Let me share my findings.

1) Query the core id of a thread.

Let’s start with the second question of the OP first:

Is there a way in Julia to figure out which core a thread is running on?

Thanks @pbayer for the pointer to schedule_getcpu(). We can call it in Julia like so:

glibc_coreid() = @ccall sched_getcpu()::Cint

and query the core id of a specific thread using ThreadPools’ @tspawnat:

using ThreadPools
tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());

Running the following script on a cluster node,

using ThreadPools
using Base.Threads: nthreads

glibc_coreid() = @ccall sched_getcpu()::Cint
tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());

for i in 1:nthreads()
    println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)))")
end

I get

$ julia -t10 threads_cpuids_glibc.jl
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 4)
Running on thread 3 (glibc_coreid: 3)
Running on thread 4 (glibc_coreid: 6)
Running on thread 5 (glibc_coreid: 5)
Running on thread 6 (glibc_coreid: 8)
Running on thread 7 (glibc_coreid: 7)
Running on thread 8 (glibc_coreid: 10)
Running on thread 9 (glibc_coreid: 9)
Running on thread 10 (glibc_coreid: 12)

I confirmed with random computations and htop that these core ids are actually correct. Great!

What about macOS (and Windows)?

Note that while sched_getcpu() is available on linux it isn’t on macOS (and neither on windows?). Looking for a pendant, I found this SO thread which mentioned that it should be possible using the cpuid machine instruction which is wrapped in CpuId.jl. We are currently trying to make it work, see CpuId-based sched_getcpu pendant for macOS · Issue #46 · m-j-w/CpuId.jl · GitHub.

2) Pinning threads to specific cores

Using the script from above (and htop as a crosscheck) I can confirm that JULIA_EXCLUSIVE=1 forces Julia to put the threads on core ids 1:nthreads():

$ JULIA_EXCLUSIVE=1 julia -t10 threads_cpuids_glibc.jl
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 1)
Running on thread 3 (glibc_coreid: 2)
Running on thread 4 (glibc_coreid: 3)
Running on thread 5 (glibc_coreid: 4)
Running on thread 6 (glibc_coreid: 5)
Running on thread 7 (glibc_coreid: 6)
Running on thread 8 (glibc_coreid: 7)
Running on thread 9 (glibc_coreid: 8)
Running on thread 10 (glibc_coreid: 9)

But what about choosing other cores? I tried using numactl --physcpubind first:

$ numactl --physcpubind=3,5,7,12 julia -t4 threads_cpuids_glibc.jl
Running on thread 1 (glibc_coreid: 3)
Running on thread 2 (glibc_coreid: 12)
Running on thread 3 (glibc_coreid: 5)
Running on thread 4 (glibc_coreid: 12)

Note that the threads indeed run on cores from the given list. However, two threads happen to run on the same core. Trying this multiple times I can see no clear pattern here: the thread → cpuid mapping is varying and also which core (if any) hosts more than one thread. So my takeaway is that numactl only allows us to restrict the Julia threads to a specific domain of cores.

I also tried likwid-pin -c. Strangely, I had to specify one more core id than Julia threads to prevent “Roundrobin placement triggered” message (which almost always indicate that something is wrong). I found:

$ likwid-pin -c 0,9,14,32,76 julia -t4 threads_cpuids_glibc.jl
[pthread wrapper]
[pthread wrapper] MAIN -> 0
[pthread wrapper] PIN_MASK: 0->9  1->14  2->32  3->76
[pthread wrapper] SKIP MASK: 0x0
	threadid 22624924616448 -> hwthread 9 - OK
	threadid 22624673154816 -> hwthread 14 - OK
	threadid 22624657823488 -> hwthread 32 - OK
	threadid 22624642492160 -> hwthread 76 - OK
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 14)
Running on thread 3 (glibc_coreid: 32)
Running on thread 4 (glibc_coreid: 76)

That’s almost what we want! However, it’s odd that we have to provide one more cpu id and that the second id isn’t used. Trying one more time:

$ likwid-pin -c 0,40,41,42,43,44,45,46,47,48,49,50 julia -t10 threads_cpuids_glibc.jl
[pthread wrapper]
[pthread wrapper] MAIN -> 0
[pthread wrapper] PIN_MASK: 0->40  1->41  2->42  3->43  4->44  5->45  6->46  7->47  8->48  9->49  10->50
[pthread wrapper] SKIP MASK: 0x0
	threadid 23027507218176 -> hwthread 40 - OK
	threadid 23027249415936 -> hwthread 41 - OK
	threadid 23027234084608 -> hwthread 42 - OK
	threadid 23027218753280 -> hwthread 43 - OK
	threadid 23027203421952 -> hwthread 44 - OK
	threadid 23026984806144 -> hwthread 45 - OK
	threadid 23026970117888 -> hwthread 46 - OK
	threadid 23026955429632 -> hwthread 47 - OK
	threadid 23026940741376 -> hwthread 48 - OK
	threadid 23026933802752 -> hwthread 49 - OK
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 41)
Running on thread 3 (glibc_coreid: 42)
Running on thread 4 (glibc_coreid: 43)
Running on thread 5 (glibc_coreid: 44)
Running on thread 6 (glibc_coreid: 45)
Running on thread 7 (glibc_coreid: 46)
Running on thread 8 (glibc_coreid: 47)
Running on thread 9 (glibc_coreid: 48)
Running on thread 10 (glibc_coreid: 49)

Seem to be consistent, but probably needs a bit more testing across different architectures. (I had tested this yesterday as well and I thought that I had multiple thread on the same core here as well… but maybe I’m misremembering.)

Note that JULIA_EXCLUSIVE=1 overwrites both numctl and likwid-pin and puts Julia’s threads on cores 1:nthreads() irrespective of the provided cpu id list:

$ JULIA_EXCLUSIVE=1 numactl --physcpubind=9,14,32,76 julia -t4 threads_cpuids_glibc.jl
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 1)
Running on thread 3 (glibc_coreid: 2)
Running on thread 4 (glibc_coreid: 3)

$ JULIA_EXCLUSIVE=1 likwid-pin -c 9,14,32,76,77 julia -t4 threads_cpuids_glibc.jl
[pthread wrapper]
[pthread wrapper] MAIN -> 9
[pthread wrapper] PIN_MASK: 0->14  1->32  2->76  3->77
[pthread wrapper] SKIP MASK: 0x0
	threadid 22926639785728 -> hwthread 14 - OK
	threadid 22926388324096 -> hwthread 32 - OK
	threadid 22926372992768 -> hwthread 76 - OK
	threadid 22926357661440 -> hwthread 77 - OK
Running on thread 1 (glibc_coreid: 0)
Running on thread 2 (glibc_coreid: 1)
Running on thread 3 (glibc_coreid: 2)
Running on thread 4 (glibc_coreid: 3)

What about macOS (and Windows)?

Probably no chance? Both numactl and likwid-pin are only available on linux (please correct me if I’m wrong / if there are alternatives or workarounds).

(cc @Elrod, @vchuravy)

6 Likes