As a fork of this thread I created a MWE that illustrates a behaviour I have seen frequently and which, clearly, I do not understand.
First, let us sum serially and in parallel some simple computation that does not involve random numbers:
function sum_notrand_serial(n)
s = 0.
for i in 1:n
s += exp(1/i)
end
s
end
function sum_notrand_parallel(n)
nthreads = Threads.nthreads()
s = zeros(nthreads)
n_per_thread = n ÷ nthreads
Threads.@threads for i in 1:nthreads
for j in 1:n_per_thread
s[i] += exp(1/((i-1)*n_per_thread + j))
end
end
sum(s)
end
Using 4 threads, I get:
julia> @btime sum_notrand_serial(10_000)
128.227 μs (0 allocations: 0 bytes)
10010.865744767296
julia> @btime sum_notrand_parallel(10_000)
44.106 μs (22 allocations: 3.11 KiB)
10010.865744767287
Thus a reasonable speedup.
Now let us compute another sum, but instead of that exp
function, I will just sum random numbers:
function sum_rand_serial(n)
s = 0.
for i in 1:n
s += rand()
end
s
end
function sum_rand_parallel(n)
nthreads = Threads.nthreads()
s = zeros(nthreads)
n_per_thread = n ÷ nthreads
Threads.@threads for i in 1:nthreads
for j in 1:n_per_thread
s[i] += rand()
end
end
sum(s)
end
Results in:
julia> @btime sum_rand_serial(10_000)
33.078 μs (0 allocations: 0 bytes)
4984.942412847785
julia> @btime sum_rand_parallel(10_000)
47.776 μs (22 allocations: 3.11 KiB)
4998.453249343397
The parallel version is slower than the serial one.
What is going on? When this happen and how can it be avoided?