Help understanding what happens in a `@threads` loop with race conditions

(I’m just a curious learner on this topic so please don’t consider what I say as an expert opinion. Having said that…)

I think this is expected. Although Julia does not specify its memory model, I guess it’s safe to assume it’s aiming to provide sequential consistency for data race-free programs (SC for DRF), like C, C++ and Java do. In that case, as soon as you write a data race, your program’s behavior is undefined and you cannot expect any sane behavior. This is because, unless you use some synchronization protocol (e.g., atomics), the compiler and the hardware (CPU/cache) are free to transform (optimize) your program to something else as long as the difference is not observable in single threaded execution. For example, the compiler might change the inner loop to load four elements from v at the time, do v[i] *= M[i,j], and then write back all four elements even if they are not modified.

FYI, I think this two-part talk is a nice introduction to this topic. In fact, my reply is solely based on what I learnt from it:

To write mask!, I’d just use threaded map over the first axis of M, even though it makes the memory access of the inner loop non-contiguous. Of course, it’d be nice to store the transposed version of M if other parts of the program are OK with it.

4 Likes