Race condition when writing the same value in a parallel loop

I have a parallel loop similar to the following

Threads.@threads for (aₓ, bₓ) in collect(Iterators.product(1:N.A, 1:N.B))
	abₓ = joinAb(aₓ, bₓ)
	output1[abₓ], V = myfunc1(abₓ)
	output2[aₓ, bₓ] = myfunc2(V, aₓ, bₓ)	
end

So, in my parallel loop I am iterating over aₓ and bₓ. I combine (aₓ, bₓ) to form the index abₓ. However, multiple values of aₓ and bₓ may map to a single abₓ. Is it safe to assign a value to output1[abₓ] in my parallel loop? In a serial loop, it would just overwrite with the same value, but I am unsure what this would do if two or more threads were attempting to assign the same value to output1[abₓ] simultaneously. Thanks.

It will lead to a race condition. However, since every thread is writing the same value (I suppose myfunc1 does not depend on a global state), you should be fine. IIUC, it can be classified as a non-harmful race condition :smiley: You are only seem to be computing unnecessary information and decreasing your performance.

1 Like
function test()
       a = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3]
       b = [0,0,0]

       Threads.@threads for i in a
           b[i] = i
       end

       return b
end
julia> test()
3-element Vector{Int64}:
 1
 2
 3

julia> test()
3-element Vector{Int64}:
 1
 2
 3

julia> test()
3-element Vector{Int64}:
 1
 2
 3

julia> test()
3-element Vector{Int64}:
 1
 2
 3

Julia’s manual seems to suggest that there are no harmless data races in Julia (like in other languages):

You are entirely responsible for ensuring that your program is data-race free, and nothing promised here can be assumed if you do not observe that requirement. The observed results may be highly unintuitive.

This is a language design choice that most languages make because it is important for enabling competitive performance, even though it’s not very “user-friendly”.

EDIT: Julia doesn’t have a worked-out memory model yet, so there is no clear answer to this question yet. See here. So it’s currently best to assume there are no benign data races, just to be safe.

In your particular scenario, assuming myfunc1 is a pure function, assigning the value in your parallel loop would be safe (e.g., from the point of view of correctness). However, this is an empirical/anectodal conclusion - please see the @nsajko post for the more unpleasant picture :slight_smile:

Now, if myfunc1 is also computationally expensive, you perform duplicate work and waste resources/time.

One way to avoid doing duplicate work would be to hide the myfunc1 calls and output1 updates beyond a Channel/Task that keeps track of whether myfunc1 was already called using a specific abₓ value: in this way, you would ensure that you only spawn tasks that are going to do non-duplicate work and at the same time you have the guarantee to avoid data-race conditions altogether.

The above approach assumes that your myfunc1 function is expensive enough to be worth paying for the additional overhead involving spawning new tasks.

1 Like