Calling max() atomically on a compound type

I’ve attempted to understand the Julia docs on the newish @atomic stuff multiple times and yet I’m still struggling to fully understand. The docs are actually quite sparse on the topic, and what’s worse is that they point to the ‘atomics manifesto’ design document, which is clearly only partially implemented in Julia at present.

As an example, I have:

struct Result
    action::Union{Int, Nothing}

Base.isless(r1::Result, r2::Result) = isless(r1.score, r2.score)

And I want to atomically set the maximum result from within multithreaded code, e.g.:

best::Result = max(best, candidate)

The manifesto suggests I could do something like

(@atomic best)::Result = Result(typemin(Int), nothing)
@atomic max(best, candidate)

Unfortunately, this isn’t yet supported. So it looks like I’m forced to create a wrapper struct with the field marked as atomic, eg.

struct Best
    @atomic result::Result

@atomic best = Best(Result(typemin(Int), nothing))
@atomic max(best.result, candidate)

However, this JuliaCon talk suggests there are cases where atomic lines (like @atomic a.x = a.x + 2) are not thread safe. In lieu of better documentation, it’s not at all clear to me when my code will be threadsafe - and I’m unsure if the above code is itself threadsafe.

What’s the best way to do something like this?

@atomic operations are only supported on mutable structs - there is no sense in which the field of an immutable struct can be set (or read) in an atomic way, as the field is defined by value. There are no data races on immutable fields.

If anything, you’d have

struct Result
    action::Union{Int, Nothing}

Base.isless(r1::Result, r2::Result) = isless(r1.score, r2.score)

mutable struct Best
    @atomic res::Result

b = Best(Result(typemin(Int), Nothing))
r = Result(0, 0)

@atomic max(b.res, r)

which will store max(b.res, r) in b.res.

I’m not quite sure which part of the talk you’re referring to, but (if my assumption is correct), you’re asking about the difference between @atomic a.x = a.x + 2 and @atomic a.x += 2, correct? The difference is that in the second case, the whole operation (reading a.x, adding 2, and storing the result back in a.x) is guaranteed to be done in one, uninterrupted swoop. In contrast, @atomic a.x = a.x + 2 does not guarantee that - reading from an @atomic variable is always free of data races, and so that code is exactly the same as

tmp = a.x + 2
@atomic a.x = tmp

It can now happen that between assigning tmp (which truly is what happens under-the-hood) and storing that in a.x, another thread already overwrote a.x, leading to a data race because tmp was calculated with the old value of a.x, overwriting the already stored result.

For reduction operations like those in your example, it’s usually fastest to not use a shared variable for the main part of the reduction and instead reduce per-thread. Afterwards, reduce the per-thread reductions again, which should be possible to do in single threaded execution.

Explicit sharing of state like this is generally useful when the operation you’re doing is not transitive or the order of operations matters a lot, which is something that’s sadly missing from lots of illustrative examples showing data races, such as the summation presented in the juliacon talk.

Thanks for my your reply. In my case, the order of operations matters so a thread wise reduction isn’t possible (it’s a parallel alpha-beta pruning implantation).

The example code you gave is the second example I gave.

I guess my concern about the atomicity of the example comes from my limited understanding of how atomics are implemented in hardware. My understanding was that certain primitive values could be atomically manipulated by a limited set of low level functions. But here, max is doing some custom CPU work between reading the value and ultimately writing it back, and I’m surprised that this can all be done - since there’s no longer any low level primitive available for my comparison statement. Or is Julia behind the scenes swapping this out for a lock?

And what if the work done as part of the call to max is substantial?

Yes, except your second example doesn’t work because Best is not mutable :slight_smile:

julia> struct Best
           @atomic result::Result
ERROR: invalid field attribute atomic for immutable struct
 [1] top-level scope
   @ REPL[1]:1

At the hardware level, only atomicity of some operations (usually on things that fit in a single CPU register/instruction) is guaranteed, yes. If you think about it, it’s the same as working with immutable structs in julia - you can’t change just a single bit of a register, instead you load the value, create an entirely new one, and then write all of that result into the register, overwriting all bits that were there before. Similarly, in julia you can’t just change a single field of an instance of an immutable struct, you have to create a whole new instance (possibly composed of parts of the old instance) and assign the result to the existing variable (thereby replacing the old value entirely).

The guarantees @atomic provides are due to the julia runtime & memory model giving additional guarantees about internal pointer accesses (i.e. fields of mutable objects), which is where all the tricky data races take place. So what @atomic order max(a.x, b) does is call Base.modifyproperty!(a, :x, max, b, order) (order defaults to :sequentially_consistent), which in turn calls Core.modifyfield! with the same arguments, which ultimately ensures from the runtime that the modification is done atomically. Behind the scenes, this does fall back to a lock if the object in question does not fit in a register, yes (see julia/datatype.c at c4fd8a48383235e818c44231d7507168c27f393b · JuliaLang/julia · GitHub).

That’s a good question :thinking: A cursory look at the source linked above just has a call to jl_apply_generic, suggesting to me that it just… does whatever the called binary function does? As written, I don’t see anything ensuring the load & op are protected, but I may be missing something in terms of the fencing (there is a conditional jl_fence after all). Maybe @jameson can clarify?

All other atomics can be implemented with just one primitive: compare-and-swap–indeed, on many architectures, this is the only way to implement them. Many languages don’t implement support for this flexibility unfortunately (Some languages do though, such as rust! AtomicI8 in std::sync::atomic - Rust). Your update function can be arbitrarily expensive to compute, but if it is more than a few dozen cpu cycles, you might want to consider a better / faster strategy than atomics for updating the data, such as using a lock. And if it is not a pure update function (not a function of its arguments without other side effects), you must be aware of the ABA problem and ensure it does not affect your algorithm. However, max is usually pure, so that is not a concern.

1 Like

So one further question regarding the syntax that’s not clear to me:

From the documentation, the function:

struct A; @atomic x::Int; end
a = A(4)
@atomic max(a.x, 1)

will apparently update a.x with the value returned by the max(). This seems very confusing - my side-effect-free function suddenly becomes in-place mutating in this context.

Presumably, the following is not race free?

@atomic a.x = max(a.x, 1)

Moreover - what if I wanted to execute max() purely for its side-effects, and to have no effect on a.x? Would this be written as:

max((@atomic :modifier a.x), 1)

I’m sure lots of thought went into this syntax, but it definitely doesn’t seem intuitive to me, and I feel the docs really skim over the details.

And to add another one, suppose I have two instances of A:

a = A(3)
b = A(2)

@atomic max(a.x, b.x)

This will do an update of a.x but not b.x. So: the syntax only works with the first variable? Or is it the first atomic variable?

OK to answer my own question:

@atomic :modifier a.x f arg2

is the canonical syntax, that allows a.x to modified in place with function f, where f is limited to just two input parameters.

@atomic :modifier f(a.x, arg2)

is an alternative, and imo, extremely confusing syntax.

In all cases, only the first parameter a.x will be modified, and the function f is limited only to 2 arguments in total.

There are an equal number of atomic operations as a.x expressions. The number and placement of those that are required depends on the algorithm you are implementing, and can be quite challenging to get right. When in doubt, use a lock. Locks are implemented with simply 2 atomic operation anyways.