@spawn
actually assigns tasks to threads that are not busy. Let’s say I have a function slow(n::Int)
calibrated such that it keeps a core on my machine busy around 1 ms for each n
:
using .Threads, BenchmarkTools
function slow(n)
res = 0
for _ in 1:n*2310
res += sum(sin(1/rand()).^rand(1:5) for _ in 1:10)
end
return res
end
julia> @benchmark slow(1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 967.756 μs (0.00% GC)
median time: 996.769 μs (0.00% GC)
mean time: 1.026 ms (0.00% GC)
maximum time: 2.520 ms (0.00% GC)
--------------
samples: 4872
evals/sample: 1
julia> @benchmark slow(1000)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.022 s (0.00% GC)
median time: 1.028 s (0.00% GC)
mean time: 1.034 s (0.00% GC)
maximum time: 1.058 s (0.00% GC)
--------------
samples: 5
evals/sample: 1
Then I can check sequential vs parallel execution:
julia> @time foreach(_->slow(1_000), 1:nthreads())
9.477004 seconds (104.81 k allocations: 5.884 MiB)
julia> @time @sync foreach(_->(Threads.@spawn slow(1_000)), 1:nthreads())
1.385115 seconds (117.54 k allocations: 6.567 MiB)
julia> @time @sync @threads for _ in 1:nthreads()
slow(1_000)
end
1.376207 seconds (118.42 k allocations: 6.578 MiB)
You see that in a case where all tasks take equally long there is no difference between @spawn
and @threads
. You can check also with htop
that all CPUs are employed.
Let’s check dynamic scheduling. If I randomize the n
argument to slow
uniformly between 1:1000, a task should take on average 0.5 seconds. Such if I spawn 160 such randomized tasks on 8 cores it should take roughly 10 seconds if all cores are employed:
julia> @time @sync for _ in 1:160
Threads.@spawn slow(rand(1:1000))
end
13.303092 seconds (19.02 k allocations: 1.198 MiB)
htop
shows that all 8 cores are employed equally well:
Let’s check with @threads
:
julia> @time @sync @threads for _ in 1:160
slow(rand(1:1000))
end
14.933920 seconds (35.20 k allocations: 1.976 MiB)
This takes a bit longer since at the end of the computation the CPU load gets more unbalanced:
Now let’s check with Distributed pmap as suggested by @marius311 :
julia> using Distributed
julia> addprocs();
julia> nprocs()
17
julia> @everywhere function slow(n)
res = 0
for _ in 1:n*2310
res += sum(sin(1/rand()).^rand(1:5) for _ in 1:10)
end
return res
end
julia> @time pmap(_->slow(rand(1:1000)), 1:160);
10.296406 seconds (236.70 k allocations: 12.383 MiB, 0.09% gc time)
OK, now that beats the above two with htop
showing also the hyper-threads employed:
Two notes of caution
- You speak of
@spawn
andworkers
,@everywhere
andThreads
intermingled. Please note that those are actually two different concepts of parallel computing in Julia. Better to not mix them together. - There are pathological cases where the scheduling of tasks to threads with
@spawn
does not work properly.