Please help me to understand the `OhMyThreads.jl` parallel mutation example

Hi all,

The documentation of OhMyThreads.jl contains an example of an apparently very simple parallelization:

# Base.Threads
using Base.Threads: @threads
data = rand(10)

@threads for i in eachindex(data)
    data[i] = calc(i)
end

However, it is comes with this clear warning:

Parallel mutation of non-local state, like writing to a shared array, can be the source of correctness errors (e.g. race conditions) and big performance issues (e.g. false sharing). You should carefully consider whether this is necessary or whether the use of thread-safe storage is the better option. We don’t recommend using the examples in this section for anything serious!

This made me wonder if my mental model of multi-threading in Julia is completely wrong. The example corresponds with my most used parallelization pattern.

  • Is the code above incorrect assuming calc is a pure function?
  • Is the code above incorrect if calc uses part of the data as inputs, i.e. calc(data[i], i)?
  • Is it just about efficiency?
  • What is the most correct way to implement such a loop?

I would be very happy if somebody could help me to understand this points better.

Thanks a lot!
Andreas

2 Likes

The code examples there are not incorrect (as far as we know). The problem is just that multithreaded mutation is hard to get right.

Once you start getting into complicated examples, it’s very very easy to make a mistake when dealing with mutation and multithreading.

For this reason, we strongly recommend that users try to write code that does not require any mutation, or at the very least, only mutates task-local state.

It generally just scales better and is more reliable.

6 Likes

The code

@threads for i in eachindex(data)
    data[i] = calc(i)
end

is correct if data is a Vector, but not necessarily if data is something else, like a BitVector, i.e. data = falses(10). In that case, all the values are read and written together as a UInt64, so the updates will interfere with each other.

3 Likes

Thanks a lot for this great example! I was not aware that not all vectors are the same in this respect. This would lead to a very nasty bug indeed.

Thanks! I see how things can get messy when we have some local mutations like in the example given in here.

However, can I avoid mutations when I need to store all results of calc(i) somwhow? This situation is quite common in Monte Carlo simulations, often I do not want to reduce the results directly.

1 Like

For that you can do

# OhMyThreads
using OhMyThreads: @tasks

data = @tasks for i in 1:10
    @set collect=true
    calc(i)
end

or

using OhMyThreads: tmap

data = tmap(i->calc(i), 1:10)

or

using OhMyThreads: tcollect

data = tcollect(calc(i) for i in 1:10)
3 Likes

Thank you very much! I’ve missed the @set collect=true option.

I think it my be useful to add this simpler example to the docs as they cover a quite common usecase. The current examples on thread-safe storage examples are rather compliced

That example is in the docs right under the one you linked to

The thing to remember is: You have a data race if there are multiple accesses to the same address, at least one of which is a write.

There are techniques that help you avoid data races, using high-level abstractions, like e.g. “don’t ever overwrite memory”, aka persistent/functional datastructures, or “mutating threads have exclusive ownership”, aka “there is no way of accessing the object from other threads”. In some languages like e.g. Rust, there is a lot of compiler support to help you not mess up.

Since abstract high-level techniques have a tendency to either incur high runtime costs, or involve lots of decorum, it is totally fine to just write data-race free code without any abstraction to help you. This is not a code smell.

But the above low-level definition of “race condition” doesn’t play nice with abstractions like AbstractVector.

So by opting in to the more fine-grained view of data races, you require that yourself and all future readers / maintainers of your code pierce the veil of abstractions, and don’t think about “mutate an object” but rather think about “issue the following store to memory”, i.e. keep the generated @code_llvm / @code_native in mind (otherwise they will shoot themselves with concurrent modification of BitArray, SparseArray, Dict, etc).

I personally think that this is appropriate for julia: It is not a deeply specced virtual machine, it is a tool + runtime to conveniently express the code you want, where you are both empowered and required to fluently switch between high-level (“mutate that thing!”) and low-level (“these two memory accesses don’t alias”) and very low-level (micro-architectural, “these two memory accesses hit the same cache line from multiple threads, so will be slow!”, “this branch is predictable, so better avoid CMOV!”).

But opinions differ on that, some people prefer code that doesn’t need an understanding of low level :wink:

This especially affects documentation: The authors of things like OhMyThreads docs are in a permanent conundrum about whether to stick to high-level rules of thumb, which exclude lots of good code, or to give the more nuanced low-level view, which excludes all programmers who don’t have at least a very basic knowledge of assembly / low-level.

I think the docstring you quoted is pretty ideal: They recommend something for newbies, together with references to learn more, and enough nuance for more seasoned users to know when to disregard the recommendation.

2 Likes