PSA: Reasoning about scope rules and multithreading

heliosdrm · March 11, 2021, 7:06pm

Perhaps this has been discussed before and I missed it, but during my first experimentation with multithreading, I suddenly realized that Julia’s scoping rules in loops, which a number of people find confusing (including myself when I first learnt about them), are really convenient in that context.

One of the things that made me wary of taking advantage of multithreading before, was fear of screwing up the values of the variables when running the iterations in god-knows-which-order (I had not learnt the meaning of “race condition” until recently, but I had an intuition of the concept). And when I finally decided to go for it in Julia, I was conforted to realize that each iteration creates its own scope, so that I only had to take care of the variables that have been defined outside the loop: I can trust that variables that are exclusively defined inside the loop can’t be “touched” by any other iteration that runs in parallel.

@StefanKarpinski said once that the main motivation for Julia’s scoping rules were closures. But even if this advantage on multithreading is only a nice side effect, I think that it is worth to mention it. Probably nobody will care about either closures or multithreading on their first day with Julia, but the benefit of keeping variables apart between iterations in threaded loops is easy to explain and understand - in my opinion even easier than the advantages that Julia’s scope rules have for creating closures.

tkf · March 11, 2021, 9:54pm

Regarding scoping and concurrent programming, allow me to bring up yet another (cautionary) PSA.

tl;dr Julia’s closure is great. But you still have to be careful about assignments.

Consider the following type of code that has no race at the moment:

function bigfunction(...)
    ...
    # very long lines of code
    ...
    @sync for x in xs
        @async begin
            y = f(x)
            g(h(x), y)
        end
    end
end

You might tweak this code later by adding some innocent-looking code outside of the portion using @async:

function bigfunction(...)
    if ...
        y = ...  # added
    end
    ...
    # very long lines of code
    ...
    @sync for x in xs
        @async begin
            y = f(x)
            g(h(x), y)
        end
    end
end

This now introduces a data race because y would be mutated concurrently by multiple tasks. It would be very difficult to catch it by a code review if you have many lines between the newly added code and the task-spawning portion.

Note that it does not matter if you use threading or not for this example. Even if you use @async, you have an incorrect program (unless you really mean to mutate y from different tasks). It’s just that debugging and detecting the bugs is much harder with @spawn.

You can make the above program correct again by using local (or let):

function bigfunction(...)
    if ...
        y = ...
    end
    ...
    # very long lines of code
    ...
    @sync for x in xs
        @async begin
            local y = f(x)
            g(h(x), y)
        end
    end
end

I think it’d be better to warn or throw an error for this type of code when the compiler finds it. Meanwhile, you can avoid this by making @async/@spawn block as small as possible or use let/local always if you need assignments.

StefanKarpinski · March 29, 2021, 8:47pm

This was actually one of the motivations for how the scope rules were designed. Early on we were even considering automatic parallelization of comprehensions (we still might at some point!) or even for loops. We didn’t end up doing that, but if you’re going to even be able to consider it, you really want to avoid creating spurious variable dependencies between iterations. This dictates that loop iterations have their own scope so that locals assigned don’t inadvertently spill out and require synchronization, and it also dictates that each iteration has its own separate locals, rather than reusing them, since otherwise the value of a local from a previous iteration is visible from a later one, creating a temporal dependency, which would prevent parallelization.

kristoffer.carlsson · March 29, 2021, 8:58pm

As a note, this is captured in Race condition caused by variable scope getting lifted from a multithreaded context · Issue #14948 · JuliaLang/julia · GitHub.

Topic		Replies	Views
Scoping in loops and multi-threading New to Julia multithreading	1	454	November 17, 2021
Scope in multithreading General Usage multithreading , scope	9	1075	March 22, 2022
Unexpected behavior possibly due to how scope and multithreading works on 1.4.2 General Usage multithreading , scope	2	408	April 2, 2021
Inconsistent results when using Threads.@threads in a loop General Usage multithreading	17	866	June 10, 2023
Race condition with local variable in @threads for loop General Usage multithreading	5	1730	October 29, 2020

PSA: Reasoning about scope rules and multithreading

Related topics