When I test my code, I found that the time consumed by Threads.@threads
in actual environment is bigger than the test environment. And I think the reason is that other calculation in the actual environment effects the performance of Threads.@threads
, so I write a demo to show that:
using TimerOutputs
using LinearAlgebra
using Polyester
@show Threads.nthreads()
LinearAlgebra.BLAS.set_num_threads(1)
function testmul0(A,B,C)
for i in 1:1000
mul!(C,A,B)
end
end
function testmul(A,B,C)
for i in 1:100
mul!(C[i],A,B)
end
end
function testmul_thread(A,B,C)
Threads.@threads for i in 1:100
mul!(C[i],A,B)
end
end
function testmuls(A,B,C)
for i in 1:20
@timeit "testmul0" testmul0(A,B,C[1])
end
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
end
end
function testmuls2(A,B,C)
for i in 1:20
# @timeit "testmul" testmul(A,B,C)
@timeit "testmul0" testmul0(A,B,C[1])
@timeit "testmul_th" testmul_thread(A,B,C)
end
end
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:100]
testmuls(A,B,C); # first time run
testmuls2(A,B,C);
function bar()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:100]
reset_timer!()
testmuls(A,B,C)
show(TimerOutputs.get_defaulttimer());
reset_timer!()
testmuls2(A,B,C)
show(TimerOutputs.get_defaulttimer());
end
bar()
result:
Threads.nthreads() = 5
───────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 6.24s / 100.0% 55.4KiB / 96.7%
Section ncalls time %tot avg alloc %tot avg
───────────────────────────────────────────────────────────────────────
testmul0 20 6.10s 97.9% 305ms 0.00B 0.0% 0.00B
testmul_th 20 131ms 2.1% 6.57ms 53.6KiB 100.0% 2.68KiB
───────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 6.32s / 100.0% 55.8KiB / 96.8%
Section ncalls time %tot avg alloc %tot avg
───────────────────────────────────────────────────────────────────────
testmul0 20 6.10s 96.6% 305ms 0.00B 0.0% 0.00B
testmul_th 20 218ms 3.4% 10.9ms 54.0KiB 100.0% 2.70KiB
───────────────────────────────────────────────────────────────────────
testmul0
is just a time consuming calculation.
In function testmuls
, I repeat call testmul0
and testmul_th
in two for loops, and the time consumed by testmul_th
is only 6.57
ms. But when I repeat call testmul0
and testmul_th
in one for loop, the time consumed by testmul_th
increases to 10.9
ms.
So why the calculation of testmul0
will effect the time of testmul_th
?
Thank you very much.