# Help understanding what happens in a `@threads` loop with race conditions

It tried something silly that went wrong, and I’m trying to get a better understanding of exactly why this happens. Consider the following

``````using Base.Threads

for i ∈ 1:size(M,1)
iszero(v[i]) && continue
v[i] *= M[i,j]
end
end
v
end

const n = 10^6

v = ones(Int, n)
M = rand(0:1, n, 10)

``````

Clearly there are race conditions in this loop. My instinct was that it should, nevertheless, always return the same results, my reasoning was the following

• For each element in `v` we are checking whether any of the corresponding elements of `M` are zero. If they are, we set the element of `v` to zero that’s it.
• Once an element of `v` is set to zero, there should be no way to change it back.

Of course I should never have even considered doing anything like this in the first place, I’m not posting here to say that I want to do this, I’m posting here to try to understand exactly what happens.

My current guess is that the elements of `v` are getting cached somewhere (perhaps in L1 or L2) so when other threads set them to 0 this can sometimes get missed.

For a given v and M, the results are identical if run with or without @threads.
v[i] is not 0 => v[i] = 1 * M[i,j]
but M[i,j] can be 1 for all j , so v[i] stays 1

Correct me if I missed something in your algorithm.

``````julia> findall(!iszero,v)
951-element Array{Int64,1}:
1543
2694
...
julia> M[1543,:]
10-element Array{Int64,1}:
1
1
1
1
1
1
1
1
1
1
``````

Or better, can you elaborate more about the error case you have?
(edited above, because I over interpreted, thought you expect all in v to be 0)

This is definitely not true.

``````function test(seed=999, n=n, m=10)
v = ones(Int, n)
Random.seed!(seed)
M = rand(0:1, n, m)
end
``````

## With `@threads`

``````julia> test()
3180

julia> test()
3787

julia> test()
1955

julia> test()
3822

julia> test()
2080
``````

## Without `@threads`

``````julia> test()
954

julia> test()
954

julia> test()
954

julia> test()
954

julia> test()
954
``````

Again, my question is more “what’s happening” not so much “how do I do this”. If I had to write this again (my original use case was more complicated, but the above is an MWE), I would make sure that each element of `v` is owned by exactly 1 thread.

Yes, that’s right, but my point was that if any of the `M` are `0`, then `v[i]` should be set to `0`. Only in cases where `M[i,j]` is one for all `j` do I expect `v[i]` to be non-zero. This works without `@threads`, but fails with it.

Note that the following code correctly gives the same result as the sequential version of `mask!`

``````function mask2!(v, M)
all(m > 0 for m ∈ M[i,:]) && continue
v[i] = 0
end
v
end
``````

You are right, i was able to reproduce your outcoming.

Is it possible, that the data for the concurrent threads is split over index j, which is columns (10), so the threads don’t see the complete row for `v[i] *= M[i,j]`?
This is the race condition your are seeing?
So, I would say, the random results are to be expected. What ever thread is latest for a given v[i] decides if v[i] is 0 or 1. Nothing missed here in any cache. Its just the algorithm which is not thread safe in above example.
If you change the order with:

``````    @threads for i in 1:size(M,1)
for j in 1:size(M,2)
``````

it is thread safe and the results are always equal to the non threaded version.

v[i] can be set to 0 by thread 1 and thread 2 still sees the 1. Yes this is because of caching.

Yeah, this is my suspicion as well, it would be great if one of the compiler guys could come along and give more explicit details on how Julia is deciding to cache those values, if such details are even knowable.

Yes would be interesting, but I don’t think that compilers decide about those caches (L1,L2,L3).

Here is a good starting point… I just followed some links and took an overview, but it is something I won’t go to deep, too time consuming and not so much to gain (except for the knowledge):

(I’m just a curious learner on this topic so please don’t consider what I say as an expert opinion. Having said that…)

I think this is expected. Although Julia does not specify its memory model, I guess it’s safe to assume it’s aiming to provide sequential consistency for data race-free programs (SC for DRF), like C, C++ and Java do. In that case, as soon as you write a data race, your program’s behavior is undefined and you cannot expect any sane behavior. This is because, unless you use some synchronization protocol (e.g., atomics), the compiler and the hardware (CPU/cache) are free to transform (optimize) your program to something else as long as the difference is not observable in single threaded execution. For example, the compiler might change the inner loop to load four elements from `v` at the time, do `v[i] *= M[i,j]`, and then write back all four elements even if they are not modified.

FYI, I think this two-part talk is a nice introduction to this topic. In fact, my reply is solely based on what I learnt from it:

To write `mask!`, I’d just use threaded map over the first axis of `M`, even though it makes the memory access of the inner loop non-contiguous. Of course, it’d be nice to store the transposed version of `M` if other parts of the program are OK with it.

4 Likes

Hi,

``````v[i] *= M[i,j]
``````

is not an atomic operation, so this is expected. For simplicity, imagine you only have two threads, and M is just 1x4: [1 0 0 1]

Since you split the work in threads by columns, thread 1 will take care of the first two columns [1 0] and the second thread of the other columns [0 1].

Assume each thread is working with their first element. Each one sees that v is not zero and they compute the value v*M[i,j]. For the first thread this ill be 1 and for the second thread it will be 0, but since the operation is not atomic, now the final value of v could be 0 or 1, depending on who gets faster to the store operation.

Compare these two functions (modified from your code to just work on vectors, and working with addition instead). The first one, following your approach exhibits the same problem. Using mask_t (with atomic_add!), the problem goes away:

``````using Base.Threads
using Random

v += M[i]
end
v
end

r = Atomic{eltype(v)}(one(eltype(v)))
end
r[]
end

function test(seed=999, n=10^6)
v = one(Int)
Random.seed!(seed)
M = rand(0:1, n)
end
``````
``````julia> test()
126877
500543

julia> test()
126268
500543

julia> test()
126882
500543
``````

Cheers,
You might have brought up atomic as a demonstration. But, just for completeness, let me mention that using atomic for this kind of computation is not an efficient approach (although it yields the correct answer). It’s a good idea to use threaded `reduce` whenever possible.