# Race condition when writing the same value in a parallel loop

I have a parallel loop similar to the following

``````Threads.@threads for (aₓ, bₓ) in collect(Iterators.product(1:N.A, 1:N.B))
abₓ = joinAb(aₓ, bₓ)
output1[abₓ], V = myfunc1(abₓ)
output2[aₓ, bₓ] = myfunc2(V, aₓ, bₓ)
end
``````

So, in my parallel loop I am iterating over `aₓ` and `bₓ`. I combine `(aₓ, bₓ)` to form the index `abₓ`. However, multiple values of `aₓ` and `bₓ` may map to a single `abₓ`. Is it safe to assign a value to `output1[abₓ]` in my parallel loop? In a serial loop, it would just overwrite with the same value, but I am unsure what this would do if two or more threads were attempting to assign the same value to `output1[abₓ]` simultaneously. Thanks.

It will lead to a race condition. However, since every thread is writing the same value (I suppose myfunc1 does not depend on a global state), you should be fine. IIUC, it can be classified as a non-harmful race condition You are only seem to be computing unnecessary information and decreasing your performance.

1 Like
``````function test()
a = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3]
b = [0,0,0]

b[i] = i
end

return b
end
``````
``````julia> test()
3-element Vector{Int64}:
1
2
3

julia> test()
3-element Vector{Int64}:
1
2
3

julia> test()
3-element Vector{Int64}:
1
2
3

julia> test()
3-element Vector{Int64}:
1
2
3
``````

Julia’s manual seems to suggest that there are no harmless data races in Julia (like in other languages):

You are entirely responsible for ensuring that your program is data-race free, and nothing promised here can be assumed if you do not observe that requirement. The observed results may be highly unintuitive.

This is a language design choice that most languages make because it is important for enabling competitive performance, even though it’s not very “user-friendly”.

EDIT: Julia doesn’t have a worked-out memory model yet, so there is no clear answer to this question yet. See here. So it’s currently best to assume there are no benign data races, just to be safe.

In your particular scenario, assuming `myfunc1` is a pure function, assigning the value in your parallel loop would be safe (e.g., from the point of view of correctness). However, this is an empirical/anectodal conclusion - please see the @nsajko post for the more unpleasant picture

Now, if `myfunc1` is also computationally expensive, you perform duplicate work and waste resources/time.

One way to avoid doing duplicate work would be to hide the `myfunc1` calls and `output1` updates beyond a `Channel/Task` that keeps track of whether `myfunc1` was already called using a specific `abₓ` value: in this way, you would ensure that you only spawn tasks that are going to do non-duplicate work and at the same time you have the guarantee to avoid data-race conditions altogether.

The above approach assumes that your `myfunc1` function is expensive enough to be worth paying for the additional overhead involving spawning new tasks.

1 Like