Threads, synchronization, consistency

I’m following the documentation here.

I’ve run this code from the documentation:

n = 10
a = zeros(n)
Threads.@threads for i = 1:n
    a[i] = Threads.threadid()
end
a

I’m aware of and also have used @spawn, I don’t think it’s relevant here.

I haven’t found a detailed description of the @threads macro’s synchronization/consistency protocol. I’m understanding that, as far as my program is concerned, those threads effectively live for the duration of the for loop, and that we’re single-threaded thereafter. However, thinking back of my “operating systems” college course, these various threads will presumably have run on different cores and maybe even different processors, and the memory hierarchy of my computer can be very complicated.

Again, thinking of my “operating systems” course, you’d need some sort of memory barrier to force all processors and cores to flush their caches to main memory and invalidate everything, to ensure that the rest of the program can now access the a array and obtain the expected result. If this is not done, theoretically, at the end of the loop above, when it prints a, it could print any arbitrary thing.

Reading the documentation, I’m not seeing where it says that Threads.@threads ensures that memory is in a synchronized, consistent state at the end of the loop. Is this a guarantee? Did I just fail to see it somewhere? Or is there a multithreading primitive that I have not seen that must be used to synchronize all processors and cores after the multithreaded loop completes?

Thanks, and sorry for the basic question.

@threads is a bit misnamed, it’s not spawning OS-level threads at all, but Julia level tasks that are multiplexed over the number of OS threads Julia runs with.

It doesn’t - you can very much get race conditions, false sharing and all complexity you’d otherwise expect of multithreaded code. All @threads guarantees for you is that the loop enclosed by @threads acts as a block, continuing execution of the following code only after each task spawned by @threads finishes execution.

See also this post on the julialang blog for some information/patterns on avoiding that.

1 Like

I’m not sure this answers my question. Take specifically the PSA link you posted. If I understand what you write, the PSA wrongly instructs us to use :static, which does not appear to have any more of a memory barrier to it than :dynamic. Could you please tell me how to put a memory barrier in the :static example from the PSA, after all the threads have wound down?

I cannot use the @spawn solution because it causes hundreds of allocations.

Edit: also, it doesn’t say anywhere that @spawn has a memory barrier. I need to know whether Julia ensures that memory state is consistent at the completion of threads and tasks and spawns and whatever else, and if not, what I need to do to synchronize all the processors.

Tasks have an implicit memory barrier. Memory accesses within tasks have the expected consistency and if you wait/fetch a task you establish a happens-before relationship.

@threads creates a number of tasks under the hood and synchronizes them afterwards. So there is an implicit memory barrier.

We probably need to document these things better especially after atomics for arrays have been added to language. (Right now you need Atomix.jl for atomic operations over arrays.)

4 Likes

OK so I think that’s my answer. I now understand that in the snippet I provided in my original post, there is some implied memory barrier at the end of the for loop, and when it displays a, that occurs after the memories of all the processors have been synchronized.

With the caveat that this is only (technically) guaranteed for atomic stores and writes.

2 Likes

OK well now you have me confused. Is it guaranteed for my original program, which was taken from the documentation?

What is the Julia definition of an atomic store or write?

For all indices, it is guaranteed that you get some value that has been written to that index, you cannot get any arbitrary value for your example. That is, in your example you get some threadid(). Because of task migration, that may not necessarily be in ascending order, for example - it depends on the order the tasks ultimately execute in, which threadid() they possess and whether there are any additional data races.

Specifically, what I’m thinking of is that “obtain the expected result” depends highly on what that expectation is. After the loop, you’re not going to read data that would have been overwritten on another thread, if that’s what you’re asking. After the loop, you’re guaranteed to read the last value written.