The issue is not a race condition. I didn’t see one during a quick read. You have misunderstood the fundamental difference between using processes (via Distributed.jl) and threads. You cannot just slap @distributed
on the code and expect it to just work. You need to think about moving data between workers.
Let me try to explain:
Workers have totally different memory spaces on your machine. In fact for all practical purposes they could live on physically different machines. So when you preallocate the X123
matrices outside the loop this happens on the main process. When you access a variable inside @distributed
(line (1)
that is declared outside, Distributed.jl copies it to the process and writes into the local copy. Thus you have no data race. But the information is local to the process! But it also means that line (3)
does not move data back to the main process. What happens is that the empty array x
is copied to each process and each process writes something into its local version and then it get garbage collected because you never transport it back.
So what should you do:
1.) Partition your tasks (e.g. using ChunkSplitters.jl) into as many chunks as you have workers
2.) use @distributed
to distribute the chunks
3.) Preallocate the necessary arrays X123
on each worker
4.) Perform the workload on each worker
5.) Write the results to a SharedArray
from SharedArrays.jl
6.) If there are large constant arrays used as input for the calculations, then consider using a SharedArray
for that as well to avoid copying it to every worker.
Side note:
I think that you could likely optimize your algebra code quite a bit and thought that you got some input on that in an earlier thread here. Did you perhaps base this on an older version of your code?