Hello,
I would like to begin by clarifying that I have very little knowledge of multithreading. I was experimenting with the following code, using the MKL
library to perform matrix multiplications with a different number of threads.
using LinearAlgebra, MKL
N = 10_000
A = rand(Float32, N, N)
B = rand(Float32, N, N)
function matrix_multiplication(A, B)
return A * B
end
N_threads = [1, 2, 4, 8, 16, 32]
for threads in N_threads
BLAS.set_num_threads(threads)
@time matrix_multiplication(A, B)
end
I found the following times
11.859840 seconds (2 allocations: 381.470 MiB)
5.999158 seconds (2 allocations: 381.470 MiB, 0.66% gc time)
2.999295 seconds (2 allocations: 381.470 MiB)
1.522145 seconds (2 allocations: 381.470 MiB)
1.321285 seconds (2 allocations: 381.470 MiB, 2.63% gc time)
1.301005 seconds (2 allocations: 381.470 MiB)
which clearly do not scale with the number of therads (when you get bigger than approximately 8, for 2 and 4 the scaling is as expected).
My versioninfo
is
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 Γ 13th Gen Intel(R) Core(TM) i9-13900KF
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
There is clearly something I am missing due to my lack of knowledge in this field. Anyone can point me in the right direction to understand why I do not get a linear scaling with the number of threads?