Inconsistent results when using Threads.@threads in a loop

fedcer · June 10, 2023, 9:18am

Hi all, I’m running into the problem of having inconsistent outputs when using the Threads.@threads macro on a loop and I struggle to understand why it’s happening.
Here is a MWE function

function coefs(N)
    v = zeros(ComplexF64, N)
    Threads.@threads for k in 1:N-1
        v_re = 0.5*k
        v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

When run on a single thread, the output is always correct, e.g.

julia> coefs(10)
10-element Vector{ComplexF64}:
 0.25 - 0.25im
  0.5 - 0.5im
  1.0 - 1.0im
  1.5 - 1.5im
  2.0 - 2.0im
  2.5 - 2.5im
  3.0 - 3.0im
  3.5 - 3.5im
  4.0 - 4.0im
  4.5 - 4.5im

However, if I run Julia with multiple threads, sometimes the results are wrong in a seemingly random way, e.g.

julia> coefs(10)
10-element Vector{ComplexF64}:
 0.25 - 0.25im
  0.5 - 0.5im
  2.0 - 1.0im
  1.5 - 1.5im
  1.5 - 2.0im
  2.5 - 2.5im
  3.0 - 3.0im
  3.5 - 3.5im
  4.0 - 4.0im
  4.5 - 4.5im

I understand that Threads.@threads can run into data races if one is not careful, but I really don’t see why that is happening here. The only thing all threads are accessing is the v vector and just to write the output to it. Is this operation not safe?

Thanks for any help.

fatteneder · June 10, 2023, 10:53am

Disclaimer: I am not an expert in Threads.@threads or friends.

Applying any of the changes below makes it work correctly on v1.9.1:

add a local in front of v_re, v_im inside the loop body
rename v_re, v_im inside the loop body
rename or remove v_re, v_im outside the loop

~~Someone correct me, but this smells like a bug in Threads.@threads.~~
EDIT: As clarified in the comments below, this is intended but also bad behavior. What is happening is that by the current scoping rules, using the names v_re, v_im also after the loop body makes them being shared between threads, which causes a data race.

Here is the fixed version:

function coefs_w_threads(N)
    v = zeros(ComplexF64, N)
    Threads.@threads for k in 1:N-1
        local v_re = 0.5*k
        local v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

function coefs(N)
    v = zeros(ComplexF64, N)
    for k in 1:N-1
        v_re = 0.5*k
        v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

ref = coefs(1000)
threads = coefs_w_threads(1000)
# display(sum(ref) == sum(threads))
display(all(ref .== threads)) # more reliable test

EDIT: I had added the local to the coefs version instead of coefs_w_threads …

lmiq · June 10, 2023, 11:07am

The tuple version also shows the issue:

julia> function coefs(N)
           v = [ (0.0,0.0) for _ in 1:N]
           Threads.@threads for k in 1:N-1
               a = k
               b = k
               v[k+1] = (a,b)
           end
           a = 0.25
           b = 0.25
           v[1] = (a,b)
           return v
       end
coefs (generic function with 1 method)

julia> coefsA(10000)
10000-element Vector{Tuple{Float64, Float64}}:
 (0.25, 0.25)
 (1.0, 1.0)
 (2.0, 2.0)
 (3.0, 3.0)
 (4.0, 4.0)
 (5.0, 5.0)
 (6.0, 6.0)
 (7.0, 7.0)
 (8.0, 8.0)
 (9.0, 9.0)
 (10.0, 10.0)
 (11.0, 11.0)
 ⋮
 (2467.0, 9989.0)
 (9990.0, 9990.0)
 (9991.0, 9991.0)
 (9992.0, 9992.0)
 (9993.0, 9993.0)
 (9994.0, 9994.0)
 (9995.0, 9995.0)
 (9996.0, 9996.0)
 (9997.0, 9997.0)
 (6219.0, 9998.0)
 (9999.0, 9999.0)

Benny · June 10, 2023, 11:15am

So it seems like the v_re and v_im variables are not local to the iterations, but local to the overall function because they are assigned after the for loop. Which is contrary to my expectations, I expected the subsequent assignments to make entirely separate variables from ones local to the iterations. This behavior is on v1.8, can someone check if this also happened in earlier versions?

julia> let
         for i in 1:3  k = i+1  end # k belongs to iteration scope
         println(k)
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ ./REPL[1]:5

julia> let
         for i in 1:3  k = i+1  end
         println(k)
         local k # k belongs to let scope with declaration or assignment
       end
4

fatteneder · June 10, 2023, 11:19am

This happens on v1.6.7,v1.3.1,v1.0.5,v0.6.4.

Benny · June 10, 2023, 11:23am

Then I suppose I remembered incorrectly and this was always how Julia behaved. Maybe the docs on the scoping rules can be more clear, a natural interpretation of “If x is already a local variable , then the existing local x is assigned” is using line order to determine what variable exists by a given line, but the whole local scope really exists at once.

lmiq · June 10, 2023, 11:24am

Indeed, actually I think this is consistent with the scoping rules, but catchy in this example. If one had defined v[1] before the loop, the problem would be evident.

lmiq · June 10, 2023, 11:24am

That deserves an example in docs, IMO.

fatteneder · June 10, 2023, 11:30am

I think this issue is unrelated to the @threads problem observed above: Inside the loop body the values of v_re, v_im are always computed independent from 1) any other iteration and 2) from what is computed after the loop.

Benny · June 10, 2023, 11:32am

Okay, this is why I may have been confused before. If the local variable exists but is not assigned a value by a line, it is considered not defined at that line.

julia> let
        println(k)
        k = 0
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ REPL[8]:2

julia> let
        local k
        println(k)
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ REPL[9]:3

I’m not really sure myself, but if data races occur because threads are writing to shared memory like v, then I would expect data races to be possible with variables local to the function, as the iterations would share those.

fatteneder · June 10, 2023, 11:32am

And the behavior should be removed in a potential v2.0, IMO.

fatteneder · June 10, 2023, 11:35am

The problem persists even with a :static scheduler, which IIUC, should ensure a data race free execution in this case.

lmiq · June 10, 2023, 11:46am

No, not really, it is not related to the scheduler. That is equivalent to:

julia> function foo(N)
           x = [ (0,0) for _ in 1:N ]
           a, b = (0,0)
           Threads.@threads for i in eachindex(x)
               a, b = (i,i)
               x[i] = (a,b)
           end
           return sum(el -> el[1], x) == N*(N+1)/2  
       end
foo (generic function with 1 method)

julia> foo(10^5)
false

The a and b variables are being modified concurrently by each thread, and may get updated wrongly. The fact that they are defined before or after the loop doesn’t mater, because they are locals of the function in both cases (and that is what is confusing, if ones assigns the label after the loop).

Since the time from the update of a and b and their use is very short, the concurrency may not be perceived, but it can happen (and does) here.

fatteneder · June 10, 2023, 12:37pm

Indeed, sorry, I messed up k with nthreads().

Elrod · June 10, 2023, 1:40pm

This is bad but intended behavior. local is necessary.

gdalle · June 10, 2023, 2:28pm

I agree, this took me by surprise!

fedcer · June 10, 2023, 4:07pm

Thanks everyone.

I get what’s happening now. With that said, though I haven’t gone through the scope docs in a bit, for sure I personally think there should be some kind of warning/mention of this behaviour at least in the multithreading docs. Although it makes sense once you know what’s happening, it is definitely quite counter-intuitive behaviour that seems to have the potential of easily catching people off guard.

Thanks again!

Benny · June 10, 2023, 8:21pm

It feels like the same realm as not being able to conditionally define methods for functions in local scopes, every method just exists at once. Issue 15602 states that dynamically creating functions is too slow and this allows functions to be hoisted to top-level (like a functor). That rationale wouldn’t apply to local variable existence, and I’m guessing lowering could be changed to differentiate the latter v_re and v_im without performance penalty, but both seem rooted in how the local scope is built during compilation. Another unintuitive implementation is how global variables act like references rather than values in function inputs.

Example of conditional methods not working in a local scope

julia> foofalse = let
         foo() = 0
         if false  foo(x) = 1  end
         foo
       end
(::var"#foo#1") (generic function with 2 methods)

julia> foo() = 0
foo (generic function with 1 method)

julia> if false  foo(x) = 1  end

julia> foo
foo (generic function with 1 method)

julia> if true  foo(x) = 1  end
foo (generic function with 2 methods)

Topic		Replies	Views
Why does Julia threads in the following way? General Usage	6	236	January 20, 2025
Race condition with local variable in @threads for loop General Usage multithreading	5	1729	October 29, 2020
Issues when using multiple threads through Threads.@threads General Usage	6	561	May 23, 2023
Scope in multithreading General Usage multithreading , scope	9	1075	March 22, 2022
Threads.@threads and BitArrays General Usage	3	480	December 4, 2019

Inconsistent results when using Threads.@threads in a loop

Related topics