Sharing a reference across multiple threads

Suppose I have a reference that’s shared across two threads.

One of the threads only sets the reference, and the other only dereferences it. Does this need a lock around the reference? On setting, on dereferencing, or both?

What about if either thread can set it and dereference it? Same questions.

In fact, what are the exact rules for when a lock is needed?

I think I have quite a few experience on this topic.

First of all, you’ll have to distinguish the difference between “mutation” (e.g. “pop!”) and “re-bind” (e.g. setfield!). The latter is a bit more tricky.

If there is only one threads doing setfield! to a reference while there are multiple other threads reading it, you may have the desirable result where you either read the old or the new but not an intermediate invalid result. In this situation you may get away with not adding locks, but you’ll have to test the feasibility on your specific machine. Here is an example I devised

As a rule of thumb, try it on your machine. If you find that the resulting behavior is acceptable or as expected even if you didn’t add locks, you can persevere (without locks).

Edit:
Carrying out experiments always help, see post #15.

The rules for when a lock is needed are in Multi-Threading · The Julia Language

  • Base collection types require manual locking if used simultaneously by multiple threads where at least one thread modifies the collection (common examples include push! on arrays, or inserting items into a Dict).

So you do need a lock in this case.

In general, the Julia compiler will try to optimize your code assuming there is only one thread, unless you use locks, channels, or atomics. I found https://assets.bitbashing.io/papers/concurrency-primer.pdf to be helpful for understanding atomics.

The first response to your question is not correct. Especially this advice:

As a rule of thumb, try it on your machine. If you find that the resulting behavior is acceptable or as expected even if you didn’t add locks, you can persevere (without locks).

Race conditions often don’t appear during testing, or appear only very sporadically due to the interactions with the environment. It’s not sufficient to check a few times and move on.

In fact, what are the exact rules for when a lock is needed?

In all the cases you’ve described, I would recommend a lock.

In the simplest case where one thread writes and the other reads, it is possible that the reader may read the memory mid-write, in which case you risk reading an intermediate, garbage value. When there are multiple writers, the situation is compounded.

The main alternative to a lock is the use of atomics which is an option when writing to a limited set of supported primitive types, see here: Multi-Threading · The Julia Language

4 Likes

This is the wrong question. Locks are not a “primitive”, they are themselves implemented via atomics.

Your question must be: What are the exact rules for when atomics are needed (and what are the recommendations on whether to use them “raw” or in the form of locks)? Which memory ordering do I need?

The short of it is: Ideally, other operations establish that your write happens-before the read. Then everything works nice as expected. Example:

r=Ref(1)
r[] = 2
@assert fetch(Threads.@spawn r[]) == 2

The write happens-before the read, so you don’t need to do anything. (i.e. locks / atomics inside the julia runtime ensure this)

If nothing ensures any ordering, i.e. your read may naively give either the old or new value, then you need to do something: Lock or atomics (:unordered can be enough).

Otherwise, you can realistically get tearing (i.e. read a value that is a mix between old and new value), or complete garbage behavior (i.e. the compiler did optimizations that you did not expect – formally the read returns undef. We’re not C-barbarians, this ain’t bat-out-of-nose UB, but undef is pretty close).

Example where atomics/locks are needed:

r=Ref(1)
t=Threads.@spawn r[]
r[] = 2
fetch(t) #will in practice return 1 or 2, but in reality returns `undef` and will mess you up!
1 Like

That makes sense, but how can you tell from the Julia manual that the compiler is not able to rewrite:

r=Ref(1)
r[] = 2
@assert fetch(Threads.@spawn r[]) == 2

As more like:

r=Ref(1)
@assert fetch(Threads.@spawn r[]) == 2
r[] = 2

Without a lock around r?

Base collection types require manual locking if used simultaneously by multiple threads where at least one thread modifies the collection (common examples include push! on arrays, or inserting items into a Dict).

Thank you for the linked pdf I will read it.

I’ve read the docs. Your quote says “Base collection types require (…)” - so is a Ref a collection type? At first I’d say no, but in reality its set and get operations are named setindex and getindex so maybe yes? I mean, I know the answer is yes only because there’s an expample where they use a Ref (ctrl-f Ref(0)).

This appears to be some jargon, translated into

  • This isn’t the worst kind of undefined behavior, but it’s still dangerously close to it.”

(I asked AI).

Since we have discussed about this many times. I still hope that anyone can come up with a concrete reproducible example where a “mix between the old and new”, or some other invalid object is returned.

I didn’t get you.

This is merely sequential programming. so the order takes effect normally. (no?)

Run with multiple threads:

julia> mutable struct Flag
           @atomic x::Bool
       end; flag = Flag(true);
julia> function exhibit_tearing(flag)
           ref = Ref((0,0,0,0))
           reader = Threads.@spawn begin
               while @atomic flag.x
                   (r1, r2, r3, r4) = ref[]
                   ( r1 == r2 == r3 == r4) || throw((r1,r2,r3,r4))
               end
           end
           for i=1:1_000_000_000
               @atomic flag.x = true
               ref[] = (i,i,i,i)
           end
           @atomic flag.x = false
           fetch(reader)
           nothing
       end

julia> exhibit_tearing(flag)
ERROR: TaskFailedException
Stacktrace:
 [1] #wait#585
   @ ./task.jl:363 [inlined]
 [2] wait
   @ ./task.jl:360 [inlined]
 [3] fetch
   @ ./task.jl:525 [inlined]
 [4] exhibit_tearing(flag::Flag)
   @ Main ./REPL[3]:14
 [5] top-level scope
   @ REPL[4]:1

    nested task error: (900314, 900314, 900314, 900315)
    Stacktrace:
     [1] (::var"#exhibit_tearing##0#exhibit_tearing##1"{Flag, Base.RefValue{NTuple{4, Int64}}})()
       @ Main ./REPL[3]:6

The atomic Flag in this example is here to force the compiler to not optimize the loop away. In other words, above is not a case of “the compiler mis-optimized your stuff”, it is an example of “the compiler did exactly what you asked of it, you simply misunderstood your processor manual”.

To see an example of the compiler fucking you up, consider

julia> function foo(r)
       while r[] end
       nothing
       end
julia> @code_llvm foo(Ref(true))
; Function Signature: foo(Base.RefValue{Bool})
;  @ REPL[4]:1 within `foo`
define void @julia_foo_2866(ptr noundef nonnull align 1 dereferenceable(1) %"r::RefValue") local_unnamed_addr #0 {
top:
  %0 = load i8, ptr %"r::RefValue", align 1
  %"r::RefValue.x" = trunc i8 %0 to i1
;  @ REPL[4]:2 within `foo`
  br i1 %"r::RefValue.x", label %L1, label %L4

L1:                                               ; preds = %L1, %top
  br label %L1

L4:                                               ; preds = %top
  ret void

Naively you might have assumed that you can end the loop by setting r[]=false in a different thread. Alas, you cannot: There is no synchronization / happens-before inside the loop; so it knows that in later loop iterations, r[] returns either the same value or undef, and it optimizes the check away (i.e. it checks only once and re-uses the result over all loop iterations).


I’m sorry to sound harsh, but this is not a julia issue, it is a basic programming issue.

All the “you need locks on collections” is meant as a shorthand for “julia decided not to add locks to most of the standard collections; if you need locks, add them yourself”. This is important info for experienced people – there are many reasonable design choices that julia could have made!

It is meant for people who know what a race and a lock is, not for beginners in programming.

Teaching programming to beginners is not an easy thing. I don’t have anything good to recommend you in terms of “introduction to programming”. But the julia docs won’t immediately answer that for you, nor will any cool julia implementation details, nor any single hard and fast rule.

The real rule is complex, and only seems simple for people with more background (in other words, it’s very simple, and only seems complex to beginners, just like “how do I solve systems of linear equations / Gauss algorithm” in high school). You can come at it from a practical or theoretical side, but ultimately you will need to learn both. So I’m not sure how helpful the following is to you, but I honestly can’t do better.


The basic model to follow: Stuff happens in the order you expect. Stuff inside a thread / task happens after the thread/task has been scheduled (via e.g. Threads.@spawn), and happens before the task ends. The return of fetch / wait on a task happens after the task ends.

If you have two accesses, at least one of them a write, on the same thing and neither happens before the other, then you have a race.

r=Ref(1) #A
t=Threads.@spawn begin #B
    r[] #C
end
r[] = 2 #D
fetch(t) #E

In this case A < B < D < E and B < C < E, but C and D have no happens-before ordering and happen “concurrently”. And at least one of C, D is a write. Hence, race.

You need to do something to address this race. Ideal is to architect your code such that races don’t happen. For example, put line D before B or after E, depending on whether you want to read 1 or 2.

If that doesn’t work, atomics. Alternatively, locks (locks are atomics in a trenchcoat, but are easier to understand for some people).

9 Likes

For a drive-by comment, I wrote a piece explaining atomics, locks, and concurrently in Julia from the basics: https://viralinstruction.com/posts/threads/

You might find it useful.

7 Likes

Thanks, wonderful example indeed.

What I found is that when the content is a small tuple of integers, I can very easily observe the invalid result. But I also found that when the tuple is replaced with Vector, I fail to observe any invalid result.

I test using --threads=8,1

function gen(c) # Two settings that lead to stark contrasting results
    c % 2 == 1 ? (1,1,1,1) : (0,0,0,0)
    # c % 2 == 1 ? [1,1,1,1] : [0,0,0,0]
end;
function check(r)
    s = getfield(r, :x)
    if s == gen(0)
        # @ccall(printf("0\n"::Cstring)::Cint)
    elseif s == gen(1)
        # @ccall(printf("1\n"::Cstring)::Cint)
    else # can you see this?
        @ccall(printf("EEEEEEEEEEEEEEEEEEEE\n"::Cstring)::Cint)
    end
end;
function test(r)
    for c = 1:1_000_000_000
        for j = 1:14
            Threads.@spawn check(r)
        end
        # @ccall(printf("%d\n"::Cstring; c::Cint)::Cint)
        setfield!(r, :x, gen(c))
    end
end;
test(Ref(gen(0)))

I’m not a computer science student. But I guess it might be due to the stack vs heap storage.

Since I literally just cannot get my results wrong in the Vector setting (or more complex structs) on my specific machine during all my tests…

Edit: see post #15.

But one experience here is that: the question “is lock entailed?” can be confirmed via tests, just like that one written in my #15 post.

The issue is not in which order instructions are created by the compiler as machine code. The cpu, at least some cpus, can issue instructions in a different order if it deems there are no dependencies. Also, writing to memory takes place in parallel, it can take hundreds of cpu-cycles. If there’s no indication that writes must be completed before a certain read takes place, the cpu is free to reorder writes and reads. In this concrete case, I guess both fetch and @spawn issues some memory barrier instruction which ensures reordering does not take place across it.

https://medium.com/@coders.stop/how-modern-cpus-reorder-your-code-without-permission-cc46fc5394e2

No, both stack and heap are memory, there’s no essential difference. It’s likely because when you use a vector, gen(c) returns a reference (that is, basically a pointer) to either a vector of ones, or a vector of zeros. No vector is ever modified. It just so happens that your cpu happens to read and write pointers (8 bytes, I guess) atomically. When you use a tuple, 32 bytes are read/written, and the cpu/memory does not read/write it atomically.

3 Likes

Okay, I’m now fully convinced that reference/dereference entails locks. (I’m not sure if there are some changes, last time I did tests with Julia 12 where the behavior was robust IIRC)

Here is the code that fails without locks (run with --threads=8,1).

struct Content
    a::Vector{Int}
    b::Vector{Float64}
    Content(c) = c % 2 == 1 ? new([1,1,1,1], [1.,1, 1]) : new([0,0,0,0], [0.,0,0])
end;
function isValid(s)
    a, b = s.a, s.b
    (all(==(1), a) && all(==(1), b)) && return true
    (all(==(0), a) && all(==(0), b)) && return true
    return false
end;
function check(r)
    s = @lock srlock getfield(r, :x)
    isValid(s) || @ccall(printf("EEEEEEEEEEEEEEEEEEEE\n"::Cstring)::Cint)
end;
function test(r)
    for c = 1:1_000_00
        for j = 1:14
            Threads.@spawn(check(r))
        end
        s = Content(c)
        @lock srlock setfield!(r, :x, s)
    end
end;
const srlock = ReentrantLock();
const r = Ref(Content(0));
const t = Threads.@spawn(test(r));
wait(t)

I’ve bookmarked this topic. When I have more free time I’ll come back to read carefully.