Problem understanding multi-threading

I fail to see why the following peace of code does not show the appropriate results. For every i-th iteration, the values of A B C D are correct, but the matrix FF sometimes contains the values that do not match the values of A B C D at the current iteration.

n1,n2 = (Int(2),Int(2))
a = rand(n1,n2)
b = rand(n1,n2)
c = rand(n1,n2)
d = rand(n1,n2)
function main(a,b,c,d)

n1  = size(a,1)
n2  = size(a,2)
FF  = Array{Float32}(undef,2,2)    
aa = vec(a)
bb = vec(b)
cc = vec(c)
dd = vec(d)

Threads.@threads for ii = 1:n1*n2
        A = aa[ii]
        B = bb[ii]
        C = cc[ii]
        D = dd[ii]     
        FF .= [A B; C D]       

        println("i=$ii |--> A=$A, B=$B, C=$C, D=$D")  
        println("i=$ii |--> FF = $FF")



For example, I get:

i=1 |--> A=0.24456115815578272, B=0.8907566253514738, C=0.8642909161063068, 
i=1 |--> FF = Float32[0.18490697 0.8512475; 0.00025189313 0.5059466]
i=2 |--> A=0.5474336567786457, B=0.07123365660482706, C=0.7875180836356128, 
i=2 |--> FF = Float32[0.5474337 0.07123366; 0.7875181 0.46567962]
i=3 |--> A=0.1849069789564115, B=0.8512474692616838, C=0.00025189313264162294, 
i=3 |--> FF = Float32[0.5474337 0.07123366; 0.7875181 0.46567962]
i=4 |--> A=0.22001676024179484, B=0.40712367021744145, C=0.21883641119645225, 
i=4 |--> FF = Float32[0.22001676 0.40712368; 0.21883641 0.5583455]

where the results at i =1,3 do not match.

This doesn’t look unexpected to me. Each thread overwrites all elements of FF, and then a bit later, reads these elements to make the string. Quite often, another thread’s writing will happen in between these two.

To safely write into an array from inside Threads.@threads, you need to ensure different threads never write to the same place. If you only write to FF[ii] then that’s sure to be safe. Sometimes it’s useful to explicitly write to A[Threads.threadid()].

1 Like

What @mcabbott said. Actually, I am surprised. Why would you expect them to be the same where several threads are writing to the same memory at the same time? And the order in which that can happen is non determinstic. Perhaps you are thinking the threads run one after the other, as long as you know that @threads runs things in parallel, it’s logical to expect that that will happen. Right?

My understanding of parallel computing and Julia itself is still quite low so far. I had a very similar piece of code running well on a MATLAB’s parfor, I just assumed it would work in a similar way (I believe every thread/core gets a copy of the whole loop and variables?).

I think I’ve managed to solve the problem following @mcabbott suggestion, by defining FF as

FF  = Array{Float32}(undef,2,2,nt)

and populating it in the loop as

FF[:,:,Threads.threadid()] .= [aa[ii] bb[ii]; cc[ii] dd[ii]]