Map and mapreduce with Threads

I would like to experiment with parallelizing an embarrassingly parallel computation on an iterable using the new multithreading in 1.3. I need map and mapreduce.

I would prefer to leave return type determination to Julia, so I would prefer not to preallocate.

Do I just need to @spawn and then fetch, as in

using Base.Threads: @spawn, fetch, threadid, nthreads

@show nthreads()

function ploop(f, itr)
    map(fetch, map(i -> @spawn(f(i)), itr))
end

function pmapreduce(f, op, itr)
    mapreduce(fetch, op, map(i -> @spawn(f(i)), itr))
end

f(i) = (@show threadid(); sleep(rand()); Float64(i))

ploop(f, 1:10)
pmapreduce(f, +, 1:10)

(The motivation for this approach is that in practice f itself can use threads and the whole things hopefully composes neatly.)

This approach is taken in https://github.com/baggepinnen/ThreadTools.jl

2 Likes

For mapreduce it might make sense to split up the list recursively, so that op can also run in parallel.

a robust mapreduce should include a lower threshold to activate. that threshold is related to the proportion of calling a function f and spawning a thread. for example, if f(x) = x+1, then the cost of spawning a thread is aproximately 5000 times more expensive (obviusly this is the worst case, but using a proportion between the time to do one operation and the time to spawn a thread can give a good approximation.

Spawning 1 thread per op seems excessive, but that strategy pays off if your operation is expensive enough to overcome the thread spawning overhead (by how much? 10 times?)

Obviously — the actual problem I am working on takes about 2–10 minutes for a single call, and the sleep in the MWE is a stand-in for this.

1 Like

Then ThreadTools does exactly what you want.

ThreadTools.jl is currently implemented with somewhat expensive computations in mind. There is a benchmark in the Readme that indicate roughly where the overhead starts becoming excessively expensive.

2 Likes