Scope in multithreading

a rather basic question about scope in multithreading:

a = 0.0
Threads.@threads for i in 1:10
    b = fun(i)
    a += b
end

in the above code:

  1. is a global and shared by all threads?
  2. is b local to each thread? i.e. a separate b is maintained inside each thread?

thanks

1 Like
  1. Yes
  2. Yes

Those are true for any loop, actually.

(of course, note that a will be possibly wrongly updated there)

thanks.

what confuses me is that “scope” seems quite different when moving from simple single thread to multi threads… and I cannot found relevant documentations talking specially for this topic…

Not really in that case. Each iteration of the loop creates a new scope, and newly defined variables inside the loop iteration are local. The variables of the outer scope of the loop are shared by the loop iterations. This is the same in both cases.

The problem with the multi-threading there is that the iterations of the loop won’t run necessarily in the same sequence, and maybe try to update a concurrently, so you can get the wrong result, and need to make the access of a safe.

1 Like

it’s expected and not a problem

but this one sounds like a problem. what does “concurrently” mean? does it mean some of the += operations may be interrupted and cannot complete?

if it’s the case, please kindly advise how to “safely” access a. Thanks.

It means that two threads may try to update the variable at the same time, reading from memory the same value, and thus the updates will occur at the same time and the result will be wrong.

This is a simple pattern that does the correct thing:

julia> ntasks = Threads.nthreads()
       at = zeros(ntasks)
       Threads.@threads for it in 1:ntasks
           for i in it:ntasks:100 # simple splitter
               b = sin(i)
               at[it] += b
           end
       end
       a = sum(at)
-0.1271710136604196

julia> sum(sin(i) for i in 1:100)
-0.12717101366041972

The key is to split a into independent variables to be updated by each thread independently, and reduce the result at the end.

But you can also use Floops.jl, ThreadX, Tullio, or other less “manual” strategies for multi-threading.

3 Likes

split-and-reduce sounds good.

what about below? is it correct? is it the “formal” method?

lk = ReentrantLock()
a = 0.0
Threads.@threads for i in 1:10
    b = fun(i)

    lock(lk)
    try
        a += b
    finally
        unlock(lk)
    end
end

It works, but it will be slower (because of the locks).

1 Like

It depends on what you do outside of the code you write. The following code is data race free (given some other reasonable assumptions like fun(i) does not introduce data races, ok itself is not a closure, etc.):

function ok()
    lk = ReentrantLock()
    a = 0.0
    Threads.@threads for i in 1:10
        b = fun(i)

        lock(lk)
        try
            a += b
        finally
            unlock(lk)
        end
    end
end

On the other hand, the following function has a data race

function bad()
    b = nothing  # added
    lk = ReentrantLock()
    a = 0.0
    Threads.@threads for i in 1:10
        b = fun(i)

        lock(lk)
        try
            a += b
        finally
            unlock(lk)
        end
    end
end

But the ok function is still bad. Don’t use lock for reduction. Also, a is not type stable. So, fun has to be very slow for multi-threading to be beneficial for this code.

FYI, for a more high-level overview, I wrote tutorials like A quick introduction to data parallelism in Julia and Efficient and safe approaches to mutation in data parallelism to make parallelism easy. If you like these tutorials, you can look at How to avoid Box · FLoops to avoid the problems I described above.

It is very important to note that this requires a commutative (and associative) operator. However, parallel reduction only requires associativity and there are many useful non-commutative reductions (e.g., concatenation).

6 Likes

why? isn’t it should be type stable given that b is also Float64?