I observe a subtle issue where the pinning of Julia threads to specific cores is spoiled massively by running a seemingly harmless computation. By spoiled I mean that after running the computation all threads are pinned to the same core(!) which is, of course, horrible for performance for everything that follows.
MWE
using Base.Threads: @threads, nthreads
using MKL # comment out -> no issue
using LinearAlgebra
# helper functions
sched_getcpu() = Int(@ccall sched_getcpu()::Cint)
function getcpuids()
nt = nthreads()
cpuids = zeros(Int, nt)
@threads :static for tid in 1:nt
cpuids[tid] = sched_getcpu()
end
return cpuids
end
# computation
function computation()
@threads :static for t in 1:nthreads()
X = rand(50, 50)
# X = rand(5, 5) # uncomment -> no issue
Y = inv(X) # comment out -> no issue
end
return nothing
end
# test loop
for i in 1:2
println("CPUIDs (before): ", getcpuids())
computation()
println("CPUIDs (after): ", getcpuids(), " \n")
end
Pinning the threads in a compact manner by using JULIA_EXCLUSIVE=1
(or, alternatively, ThreadPinning.jl) I obtain the following output (for 10 threads)
CPUIDs (before): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
CPUIDs (after): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
CPUIDs (before): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
CPUIDs (after): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
Note that the issue goes away if we either
- comment out
using MKL
or - comment out
Y = inv(X)
, i.e. no BLAS call, or - uncomment the line
X = rand(5,5)
, i.e. consider a smaller matrixX
Also note that if we re-pin the threads before each iteration (using pinthreads(:compact)
from ThreadPinning.jl) we obtain
CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
So only the first call to computation
seems to spoil the pinning.
My suspicion is that this is (somehow) related to MKL, perhaps some kind of initialisation which only happens on call? But maybe I’m wrong. Anyways this seems like a very subtle issue that I’d like to understand better and, ideally, fix somehow!
Any ideas / suggestions would be very much appreciated!
Best,
Carsten