Task migration & data-race freedom

In light of task migration,

Tasks can now migrate among threads when they are re-scheduled. Previously, a Task would always run on whichever thread executed it first

Can I still assume that f and g will never run concurrently on different threads? Do I need to ensure that f and g are data-race free with each other?

@async f(x)
g(x)

Data-race freedom is not guaranteed even with single-threaded code, because code can yield to other tasks on the same thread. @async can help prevent execution on other threads (so f and g will execute on the same thread in your example), but proper usage of locking and other synchronization primitives may be required to ensure that you don’t have races between f and g.

1 Like

This sounds somehow wrong to me. Don’t you need at least two concurrently running threads to construct a data race? Or are you talking about something strange like signal handlers here?

I suppose maybe “race condition” is a better term here; still, race conditions are quite common and easy to trigger, so I tend to fold them in with “data race”. An example would be in-place updating an array while another task is reading from that array; a yield in-between writes could cause a race, and thus undesirable behavior (even if individual array values are themselves valid).

1 Like

Ah, I think I understand: we can’t easily predict in which order tasks are scheduled in cooperative multitasking. Thanks for the explanation!

1 Like

That’s right. Single-thread (Threads.nthreads() == 1) Julia process cannot have data races by definition if we define data race as the C/C++ memory model does; i.e., defining it with respect to “OS” threads. Since we use LLVM which is primarily used for writing the C/C++ compiler, I think it is a reasonable basis in Julia. I totally agree that it is crucial to recognize that a Julia program can have some kind of races due to yield points even if Threads.nthreads() == 1. But, I think it’s important to use specific terminology for this type of race (or at least not call it data race) to avoid confusion because the C/C++ memory model has rigorous mathematical underpinning, and discussing safety of concurrent programs (especially multi-threaded) requires handling subtle details. Using the memory model terminology, every piece of code interleaved at yield points has sequenced-before edge hence is ordered properly. That said, since Julia’s coroutine implementation uses mechanisms (inline assembly) outside of the C/C++ abstract machine, my argument may be incorrect.

Cooperative multitasking is not an issue here. In fact, the most implementation of cooperative multitasking in practical languages “async/await” keywords, and the programmer have to invoke async functions using some kind of await syntax. In such a language, the yield points are syntactically clear. The problem is rather that Julia doesn’t have explicit await keywords to indicate that a certain type of race may happen.

2 Likes

It’s best to avoid thinking of “race” and “data race” as the same thing or subsets of one another in either direction. The fact the names are almost the same is really unfortunate!

Race conditions can bugs but they can also be benign and even intentional with well-defined semantics.

But data races are generally bugs; it’s very difficult to look at a given data race and declare it benign without knowing the implementation detail of both the compiler and physical CPU.

Why are data races extra bad? It’s because they violate the shared assumptions of the programmer, compiler and hardware that, by default, any observable state change is caused by a single thread of execution. Various (many? most?) transformations in both the compiler and potentially the CPU require data race freedom. If you have data races the execution can observe “impossible” states — states which can’t be interpreted according to any possible sequence of events in the original program. (To understand such impossible states might require knowledge of the physical hardware. For example, reasoning about decoherence of caches attached to different CPUs.)

A glib summary:

  • Race condition: program state depends on the order of concurrent execution
  • Data race: Something executes, but it’s not your program

Technically, data race is undefined behavior and always is a bug (if we follow the C/C++ memory model and I think we should)

For the record, I also hate the word “benign race” :wink: