As the title suggests, I have a very trivial question to ask (I’m a beginner with julia).
In my code, I would like to have several (parallel) processes do calculations (quite long and complex) and each produce a matrix.
Then I would like all these processes to add each its own matrix into a single matrix. In practice, I would like to do a parallel reduction with an entire matrix instead of a single variable.
I had thought of using SharedArrays, that is, to declare a single shared matrix external to the parallel cycle distributed among the processes and then have each process add its matrix elements to this shared matrix, but what I have read is not reliable, since a concurrency would be created.
I hope it is clear what I intend to do, but if not I will try to explain myself better.
You can always do that “manually”, by creating a vector of matrices of length nthreads, and add to those at each thread, to add to final result only at the end.
Something like:
julia> function parallel_add_random(M,N)
m, n = size(M)
M_threaded = [ zeros(m,n) for i in 1:Threads.nthreads() ]
Threads.@threads for i in 1:N
ithread = Threads.threadid()
M_threaded[ithread] += rand(m,n)
end
return M + sum(M_threaded)
end
parallel_add (generic function with 2 methods)
julia> m = rand(20,30);
julia> parallel_add_random(m,100)
20×30 Array{Float64,2}:
55.2541 48.5893 52.4891 52.7939 49.7074 53.4088 53.327 … 51.5987 56.7113 53.5566 52.6172 44.2602 53.0649 51.0608
I understand that this is not the best approach, because the M_threaded vector of matrices is constructed contiguously in memory, and that is a constraint that the parallel execution does not require (thus other memory model can be better), but if that makes any difference or not is to be seen for each problem.
Actually, right after asking this question, it occurred to me that there is a very simple way around the problem (even if it’s not the best of efficiency).