I would like to experiment with parallelizing an embarrassingly parallel computation on an iterable using the new multithreading in 1.3. I need map and mapreduce.
I would prefer to leave return type determination to Julia, so I would prefer not to preallocate.
Do I just need to @spawn and then fetch, as in
using Base.Threads: @spawn, fetch, threadid, nthreads
@show nthreads()
function ploop(f, itr)
map(fetch, map(i -> @spawn(f(i)), itr))
end
function pmapreduce(f, op, itr)
mapreduce(fetch, op, map(i -> @spawn(f(i)), itr))
end
f(i) = (@show threadid(); sleep(rand()); Float64(i))
ploop(f, 1:10)
pmapreduce(f, +, 1:10)
(The motivation for this approach is that in practice f itself can use threads and the whole things hopefully composes neatly.)
a robust mapreduce should include a lower threshold to activate. that threshold is related to the proportion of calling a function f and spawning a thread. for example, if f(x) = x+1, then the cost of spawning a thread is aproximately 5000 times more expensive (obviusly this is the worst case, but using a proportion between the time to do one operation and the time to spawn a thread can give a good approximation.
Spawning 1 thread per op seems excessive, but that strategy pays off if your operation is expensive enough to overcome the thread spawning overhead (by how much? 10 times?)
ThreadTools.jl is currently implemented with somewhat expensive computations in mind. There is a benchmark in the Readme that indicate roughly where the overhead starts becoming excessively expensive.