By the way, there’s no need to do this. BenchmarkTools @btime
already takes care of this for you.
I’m pretty sure this is yet another example of https://github.com/JuliaLang/julia/issues/15276 due to the closure created by @threads
. You can see the problem by doing:
julia> @code_warntype estimate_pi_thread(1000)
Variables
#self#::Core.Const(estimate_pi_thread)
nMC::Int64
threadsfor_fun::var"#25#threadsfor_fun#8"{Float64, Float64, UnitRange{Int64}}
n_circle@_4::Core.Box
Using the usual Ref
trick solves that problem:
function estimate_pi_thread(nMC)
radius = 1.
diameter = 2. * radius
n_circle = Ref(0)
Threads.@threads for i in 1:nMC
x = (rand() - 0.5) * diameter
y = (rand() - 0.5) * diameter
r = sqrt(x^2 + y^2)
if r <= radius
n_circle[] += 1
end
end
return (n_circle[] / nMC) * 4.
end
With that change, I see the following with julia --threads=1
(only one thread to work with):
julia> @btime estimate_pi($nMC2)
88.248 ms (0 allocations: 0 bytes)
3.1414028
julia> @btime estimate_pi_thread($nMC2)
137.082 ms (7 allocations: 576 bytes)
3.1417592
With julia --threads=2
, I see:
julia> @btime estimate_pi_thread($nMC2)
114.105 ms (12 allocations: 1.00 KiB)
1.7710592
and with --threads=4
I see:
julia> @btime estimate_pi_thread($nMC2)
110.060 ms (22 allocations: 1.89 KiB)
0.932612
so indeed there’s not much benefit from this particular manner of threading. I would guess that there’s too little work being done in each iteration for the overhead of threading to make sense.