Trivial question about avoiding concurrency in a shared matrix between processes

Good evening everyone.

As the title suggests, I have a very trivial question to ask (I’m a beginner with julia).

In my code, I would like to have several (parallel) processes do calculations (quite long and complex) and each produce a matrix.

Then I would like all these processes to add each its own matrix into a single matrix. In practice, I would like to do a parallel reduction with an entire matrix instead of a single variable.

I had thought of using SharedArrays, that is, to declare a single shared matrix external to the parallel cycle distributed among the processes and then have each process add its matrix elements to this shared matrix, but what I have read is not reliable, since a concurrency would be created.

I hope it is clear what I intend to do, but if not I will try to explain myself better.

Thank you for your help!

You can always do that “manually”, by creating a vector of matrices of length nthreads, and add to those at each thread, to add to final result only at the end.

Something like:

julia> function parallel_add_random(M,N)
          m, n = size(M)
          M_threaded = [ zeros(m,n) for i in 1:Threads.nthreads() ]
          Threads.@threads for i in 1:N
             ithread = Threads.threadid()
             M_threaded[ithread] += rand(m,n)
          return M + sum(M_threaded)
parallel_add (generic function with 2 methods)

julia> m = rand(20,30);

julia> parallel_add_random(m,100)
20×30 Array{Float64,2}:
 55.2541  48.5893  52.4891  52.7939  49.7074  53.4088  53.327   …  51.5987  56.7113  53.5566  52.6172  44.2602  53.0649  51.0608

I understand that this is not the best approach, because the M_threaded vector of matrices is constructed contiguously in memory, and that is a constraint that the parallel execution does not require (thus other memory model can be better), but if that makes any difference or not is to be seen for each problem.

1 Like

Thank you leandro for your answer.

Actually, right after asking this question, it occurred to me that there is a very simple way around the problem (even if it’s not the best of efficiency).

Anyway, your answer was helpful. Have a nice day!

1 Like