Understanding memory model for multi-threading

Hi,

I’ve been reading about and experimenting with multi-threading in Julia (https://docs.julialang.org/en/latest/manual/parallel-computing/, and https://docs.julialang.org/en/latest/base/multi-threading/).

When using an @threads for loop, is memory access guaranteed to have sequential consistency?

For example, I know the following should fail

using Base.Threads
@show nthreads()
a = zeros(Int64,1)
iterCount = 1000
@threads for i = 1:iterCount
    a[1] += 1
end
assert(iterCount == a[1])

returns

nthreads() = 2
AssertionError:

Stacktrace:
[1] assert(::Bool) at ./error.jl:68
[2] include_string(::String, ::String) at ./loading.jl:515

But what about the code below?

using Base.Threads
@show nthreads()
a = zeros(Int64,1)
alock = SpinLock()
iterCount = 100000000
@threads for i = 1:iterCount
    lock(alock)
    a[1] += 1
    unlock(alock)
end
assert(iterCount == a[1])

returns

nthreads() = 2

In that run, all increments were successful. Can I rely on this behavior? Or is it possible that threads will see an outdated version of a[1]?

I’m asking because I’ve been writing more complicated data structures which I would like to have multiple threads access using locks to achieve mutual exclusion. I could modify the data structures to use atomics if necessary, but would much prefer to avoid that.

2 Likes

No.

Yes, this is well defined.

Thanks the quick reply!

is memory access guaranteed to have sequential consistency?

No.

In that run, all increments were successful. Can I rely on this behavior? Or is it possible that threads will see an outdated version of a[1]?

Yes, this is well defined.

I’m confused about how these two answers go together. If sequential consistency does not hold for the memory, I would have thought that the second (lock-based) loop could also fail?

Let me try to be more precise. Can I assume that all operations that occur inside a lock

  1. occur atomically together?
  2. are visible to another process that later acquires the lock?

no and no.

Yes and no. The loop does not play any role, the lock does. The full explaination is too long to be given here in an accurate and unmisleading way but the specific point that’s important for this case is that the lock acquire is synchronized with lock release and both are synchronized with the critical section on the thread they execute. For a full explaination of memory ordering including how locks fits in the picture, see the talk series by Hurb Sutter Shows | Microsoft Learn