Generally, it is best to avoid intercore communication as much as possible. Apart from performance/scaling, this is to protect your sanity: Concurrent execution is hard to reason about; much easier to just have non-interacting parts.
The Threads.@threads
macro is unfortunately very terrible, and I really recommend against using it for things that are not one-off scripts. It makes applications like yours a chore, it does not compose (if there is opportunity for parallelism inside of doMyStuff
or outside of tuneParameters!
), and it gives completely incorrect intuitions about what is going on. Furthermore you always risk inadvertently hammering your intercore communication by having separate threads accumulate into separate slots in an array that share a cacheline (cf eg Random number and parallel execution - #19 by foobar_lv2).
Base offers the much better @spawn
:
import Distributed
using BenchmarkTools
par1 = rand(Int64, 1000)
par2 = rand(1000)
doMyStuff(par1,par2) = abs(0-(-par1^3+par1^2+par1+par2^3-par2^2+par2+10))
function tuneParametersB(par1,par2)
todo = Task[]
for ran in Distributed.splitrange(1, length(par1), Threads.nthreads())
t = Threads.@spawn begin
let be = Inf64, bp1=-1, bp2=-1.0
@inbounds for i in ran
pi = par1[i]
for pj in par2
res = doMyStuff(pi,pj)
if res < be be, bp1, bp2 = res, pi, pj end
end
end
(be, bp1, bp2)
end end#spawn
push!(todo, t)
end
be, bp1, bp2 = (Inf64, -1, -1.0)
for t in todo
be_, bp1_, bp2_ = fetch(t)
if(be_ < be) be, bp1, bp2 = be_, bp1_, bp2_ end
end
be, bp1, bp2
end
@btime tuneParametersB(par1,par2)
Alternatively, you can use a library for parallel mapreduce, as previous posters recommended; or you can use Threads.@threads
to multithread over the presplit ranges (in real life the code would not look as ugly, I just copy-pasted from the repl)
(the Treads.@threads
macro predates the fancy new multithreading support, that’s why it’s so terrible compared to the new task-based parallelism)