Why doesn't multithreading over a vector/array lead to data race?

Hi All,
I under stand the following function will lead to undefined behaviour

s=0   

Threads.@threads for ja in 1:1000

global s+=ja

end

as each thread is modifying the same variables at the same time.

But why doesn’t the following code lead to data race (OR does it)?

s=zeros(100,200,300)   

Threads.@threads for jc in 1:300
 for ja in 1:100, jb in 1:200

    s[ja,jb,jc]+=10

end
end

My confusion is that in the above code, for each thread, it is still accessing and modifying the variable s in my memory, and different threads are doing it at the same time, what is specifically different from the first example? There must be something difference between a variable that is a scalar (the first case), and a vector (the second case) that I miss.

My guess was that in the second example, within each thread, it is only accessing and updating a slice of s, s[:,:,jc], but not the whole s variable. Is that why the second case doesn’t lead to ill behaviour?

Each task only accesses its own “private” view of your array, s[:,:,jc]+=10 so your code is data-race free.

2 Likes

In other words:
Say you have 3 threads available, your index variable jc is running for
thread1: 1:100
thread2: 101:200
thread3: 201:300
so every thread is working on s on its own index space.

It’s because of how memory works. The first example, s += ja, translates to two memory operations. Loosely, “fetch the integer (8 bytes), the value of s”, then “add ja”, then “store the result back into s”.

If two threads fetch s simultaneously, they get the same value. They update it, and store the result. They can’t really store simultaneously, one will store first, then the other will store its value, so the first updated value will be lost.

Now, in your vector example, the same fetch-update-store thing happens, but two different threads will access different memory locations because they have different jc. That is, each thread does not fetch and store the entire vector, only the particular location determined by ja, jb, jc. So there is no conflict.

For some types of vectors, in particular BitVector, this is different. A BitVector is a vector of bits. The bits are packed, i.e. 64 bits are stored in an UInt64. When you have a bv = BitVector(rand(Bool, 128)), it’s stored as two UInt64s. However, you can access individual bits with bv[23] or bv[78], but memory can’t be fetched or stored bit by bit. You have to handle at least a whole byte (8 bits). So if you do e.g. a bv[23] ⊻= true to flip bit 23, behind the scenes the first UInt64 is fetched, the 23rd bit is flipped, and the entire UInt64 is written back. So you get the same problem as with your scalar update, different threads may happen to access the same memory location, i.e. the same UInt64.

2 Likes

Yes, and the other loop has been explained.

This is bad in any language, and not just because of global. I confirmed a data-race either way, but first I didn’t, was running in default single-threaded mode…

Then this is actually safe but maybe shouldn’t be considered ok code, because it only manifests with threading, and if you want it safe there too then you need to not annotate it.

I’m actually thinking this is worse, because if you have an outer loop in another function maybe then calling that one would neither be safe I believe, but likely only for with a global variable.

Should Julia error, at compile time, on such a code referencing a (global) scalar? Even for single-threaded with the annotation?

Note that in the generic case this could be perfectly valid code without any races because (the type of) s could be thread safe. For example, the + method could be implemented using a lock.

2 Likes