Your MWE is a bit tricky since rand
is not thread safe. One approach, suggested in https://docs.julialang.org/en/v1/manual/parallel-computing/, is to use Future.randjump to produce non-overlapping random number sequences for different threads. There are a couple of ways you can parallelize this loop. First, use locks or atomics, another is to have each thread separately store their histogram, and then add them all together. Here’s the first, with atomics, which should be faster than with locks:
using Random, Base.Threads, Future
function parHistogram()
n_bins = 20000000
n_elem = 100000000
hist = [Atomic{Int64}(0) for k = 1:n_bins]
r = let m = MersenneTwister(1)
[m; accumulate(Future.randjump, fill(big(10)^20, nthreads()-1), init=m)]
end;
@threads for i = 1:n_elem
bin = rand(r[threadid()],UInt64)%n_bins + 1
atomic_add!(hist[bin],1)
end
hist
end
The trouble is, this is much slower than the sequential version, because there’s lots of contention for the updates and not very much work on each iteration. Here’s the second version:
using Random, Base.Threads
function parHistogram2()
n_bins = 20000000
n_elem = 100000000
hist_t = [zeros(Int64, n_bins) for k = 1:nthreads()]
r = let m = MersenneTwister(1)
[m; accumulate(Future.randjump, fill(big(10)^20, nthreads()-1), init=m)]
end;
@threads for i = 1:n_elem
bin = rand(r[threadid()],UInt64)%n_bins + 1
hist_t[threadid()][bin] += 1
end
hist = zeros(Int64, n_bins)
@threads for i = 1:n_bins
for j = 1:nthreads()
hist[i] += hist_t[j][i]
end
end
hist
end
This version will get speedup, particularly as you increase the number of threads.
As for your second question, @threads
uses a static blocked schedule. To get a dynamic schedule you will want to create tasks that can be allocated dynamically by a scheduler. You can do this using Base.Threads.@spawn, by turning your loop into a recursive function. I plan to post an example doing this within the next week or so.