Julia Thread Affinity not persistent when calling MKL function

carstenbauer · January 13, 2022, 3:49pm

I observe a subtle issue where the pinning of Julia threads to specific cores is spoiled massively by running a seemingly harmless computation. By spoiled I mean that after running the computation all threads are pinned to the same core(!) which is, of course, horrible for performance for everything that follows.

MWE

using Base.Threads: @threads, nthreads
using MKL # comment out -> no issue
using LinearAlgebra

# helper functions
sched_getcpu() = Int(@ccall sched_getcpu()::Cint)
function getcpuids()
    nt = nthreads()
    cpuids = zeros(Int, nt)
    @threads :static for tid in 1:nt
        cpuids[tid] = sched_getcpu()
    end
    return cpuids
end

# computation
function computation()
    @threads :static for t in 1:nthreads()
        X = rand(50, 50)
        # X = rand(5, 5) # uncomment -> no issue
        Y = inv(X) # comment out -> no issue
    end
    return nothing
end

# test loop
for i in 1:2
    println("CPUIDs (before): ", getcpuids())
    computation()
    println("CPUIDs (after): ", getcpuids(), " \n")
end

Pinning the threads in a compact manner by using JULIA_EXCLUSIVE=1 (or, alternatively, ThreadPinning.jl) I obtain the following output (for 10 threads)

CPUIDs (before): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
CPUIDs (after): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3] 

CPUIDs (before): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
CPUIDs (after): [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

Note that the issue goes away if we either

comment out using MKL or
comment out Y = inv(X), i.e. no BLAS call, or
uncomment the line X = rand(5,5), i.e. consider a smaller matrix X

Also note that if we re-pin the threads before each iteration (using pinthreads(:compact) from ThreadPinning.jl) we obtain

CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 

CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

So only the first call to computation seems to spoil the pinning.

My suspicion is that this is (somehow) related to MKL, perhaps some kind of initialisation which only happens on call? But maybe I’m wrong. Anyways this seems like a very subtle issue that I’d like to understand better and, ideally, fix somehow!

Any ideas / suggestions would be very much appreciated!

Best,
Carsten

(@tkf, @vchuravy)

carstenbauer · January 13, 2022, 6:15pm

Mentioned by @vchuravy on Slack: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-set-affinity-of-threads-spawned-by-MKL/td-p/1026152

carstenbauer · January 14, 2022, 8:10pm

With MKL_DYNAMIC=false and MKL_NUM_THREADS=1 (or, alternatively, BLAS.set_num_threads(1)) I get the desired behavior

CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

CPUIDs (before): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
CPUIDs (after): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Topic		Replies	Views
Thread affinitization: pinning Julia threads to cores General Usage multithreading	10	3785	January 27, 2022
Why does setting JULIA_EXLCUSIVE=1 make MKL run single-threaded? Numerics mkl , linearalgebra	0	461	April 21, 2022
MKL threading bug? General Usage	4	453	June 3, 2021
[ANN] Announcing ThreadPinning.jl Package Announcements multithreading	13	1838	August 8, 2024
Julia SLURM + BLAS + Multithreading, threads not mapping well leading to poor performance Performance multithreading , mpi , slurm	5	189	June 25, 2025

Julia Thread Affinity not persistent when calling MKL function

Related topics