Inconsistent results when using Threads.@threads in a loop

Hi all, I’m running into the problem of having inconsistent outputs when using the Threads.@threads macro on a loop and I struggle to understand why it’s happening.
Here is a MWE function

function coefs(N)
    v = zeros(ComplexF64, N)
    Threads.@threads for k in 1:N-1
        v_re = 0.5*k
        v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

When run on a single thread, the output is always correct, e.g.

julia> coefs(10)
10-element Vector{ComplexF64}:
 0.25 - 0.25im
  0.5 - 0.5im
  1.0 - 1.0im
  1.5 - 1.5im
  2.0 - 2.0im
  2.5 - 2.5im
  3.0 - 3.0im
  3.5 - 3.5im
  4.0 - 4.0im
  4.5 - 4.5im

However, if I run Julia with multiple threads, sometimes the results are wrong in a seemingly random way, e.g.

julia> coefs(10)
10-element Vector{ComplexF64}:
 0.25 - 0.25im
  0.5 - 0.5im
  2.0 - 1.0im
  1.5 - 1.5im
  1.5 - 2.0im
  2.5 - 2.5im
  3.0 - 3.0im
  3.5 - 3.5im
  4.0 - 4.0im
  4.5 - 4.5im

I understand that Threads.@threads can run into data races if one is not careful, but I really don’t see why that is happening here. The only thing all threads are accessing is the v vector and just to write the output to it. Is this operation not safe?

Thanks for any help.

4 Likes

Disclaimer: I am not an expert in Threads.@threads or friends.

Applying any of the changes below makes it work correctly on v1.9.1:

  • add a local in front of v_re, v_im inside the loop body
  • rename v_re, v_im inside the loop body
  • rename or remove v_re, v_im outside the loop

Someone correct me, but this smells like a bug in Threads.@threads.
EDIT: As clarified in the comments below, this is intended but also bad behavior. What is happening is that by the current scoping rules, using the names v_re, v_im also after the loop body makes them being shared between threads, which causes a data race.


Here is the fixed version:

function coefs_w_threads(N)
    v = zeros(ComplexF64, N)
    Threads.@threads for k in 1:N-1
        local v_re = 0.5*k
        local v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

function coefs(N)
    v = zeros(ComplexF64, N)
    for k in 1:N-1
        v_re = 0.5*k
        v_im = -0.5*k
        v[k+1] = v_re + 1im*v_im
    end
    v_re = 0.25
    v_im = -0.25
    v[1] = v_re + 1im*v_im
    return v
end

ref = coefs(1000)
threads = coefs_w_threads(1000)
# display(sum(ref) == sum(threads))
display(all(ref .== threads)) # more reliable test

EDIT: I had added the local to the coefs version instead of coefs_w_threads

5 Likes

The tuple version also shows the issue:

julia> function coefs(N)
           v = [ (0.0,0.0) for _ in 1:N]
           Threads.@threads for k in 1:N-1
               a = k
               b = k
               v[k+1] = (a,b)
           end
           a = 0.25
           b = 0.25
           v[1] = (a,b)
           return v
       end
coefs (generic function with 1 method)

julia> coefsA(10000)
10000-element Vector{Tuple{Float64, Float64}}:
 (0.25, 0.25)
 (1.0, 1.0)
 (2.0, 2.0)
 (3.0, 3.0)
 (4.0, 4.0)
 (5.0, 5.0)
 (6.0, 6.0)
 (7.0, 7.0)
 (8.0, 8.0)
 (9.0, 9.0)
 (10.0, 10.0)
 (11.0, 11.0)
 ⋮
 (2467.0, 9989.0)
 (9990.0, 9990.0)
 (9991.0, 9991.0)
 (9992.0, 9992.0)
 (9993.0, 9993.0)
 (9994.0, 9994.0)
 (9995.0, 9995.0)
 (9996.0, 9996.0)
 (9997.0, 9997.0)
 (6219.0, 9998.0)
 (9999.0, 9999.0)

So it seems like the v_re and v_im variables are not local to the iterations, but local to the overall function because they are assigned after the for loop. Which is contrary to my expectations, I expected the subsequent assignments to make entirely separate variables from ones local to the iterations. This behavior is on v1.8, can someone check if this also happened in earlier versions?

julia> let
         for i in 1:3  k = i+1  end # k belongs to iteration scope
         println(k)
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ ./REPL[1]:5

julia> let
         for i in 1:3  k = i+1  end
         println(k)
         local k # k belongs to let scope with declaration or assignment
       end
4

2 Likes

This happens on v1.6.7,v1.3.1,v1.0.5,v0.6.4.

1 Like

Then I suppose I remembered incorrectly and this was always how Julia behaved. Maybe the docs on the scoping rules can be more clear, a natural interpretation of “If x is already a local variable , then the existing local x is assigned” is using line order to determine what variable exists by a given line, but the whole local scope really exists at once.

2 Likes

Indeed, actually I think this is consistent with the scoping rules, but catchy in this example. If one had defined v[1] before the loop, the problem would be evident.

That deserves an example in docs, IMO.

2 Likes

I think this issue is unrelated to the @threads problem observed above: Inside the loop body the values of v_re, v_im are always computed independent from 1) any other iteration and 2) from what is computed after the loop.

1 Like

Okay, this is why I may have been confused before. If the local variable exists but is not assigned a value by a line, it is considered not defined at that line.

julia> let
        println(k)
        k = 0
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ REPL[8]:2

julia> let
        local k
        println(k)
       end
ERROR: UndefVarError: k not defined
Stacktrace:
 [1] top-level scope
   @ REPL[9]:3

I’m not really sure myself, but if data races occur because threads are writing to shared memory like v, then I would expect data races to be possible with variables local to the function, as the iterations would share those.

1 Like

And the behavior should be removed in a potential v2.0, IMO.

The problem persists even with a :static scheduler, which IIUC, should ensure a data race free execution in this case.

No, not really, it is not related to the scheduler. That is equivalent to:

julia> function foo(N)
           x = [ (0,0) for _ in 1:N ]
           a, b = (0,0)
           Threads.@threads for i in eachindex(x)
               a, b = (i,i)
               x[i] = (a,b)
           end
           return sum(el -> el[1], x) == N*(N+1)/2  
       end
foo (generic function with 1 method)

julia> foo(10^5)
false

The a and b variables are being modified concurrently by each thread, and may get updated wrongly. The fact that they are defined before or after the loop doesn’t mater, because they are locals of the function in both cases (and that is what is confusing, if ones assigns the label after the loop).

Since the time from the update of a and b and their use is very short, the concurrency may not be perceived, but it can happen (and does) here.

Indeed, sorry, I messed up k with nthreads().

This is bad but intended behavior. local is necessary.

3 Likes

I agree, this took me by surprise!

Thanks everyone.

I get what’s happening now. With that said, though I haven’t gone through the scope docs in a bit, for sure I personally think there should be some kind of warning/mention of this behaviour at least in the multithreading docs. Although it makes sense once you know what’s happening, it is definitely quite counter-intuitive behaviour that seems to have the potential of easily catching people off guard.

Thanks again!

1 Like

It feels like the same realm as not being able to conditionally define methods for functions in local scopes, every method just exists at once. Issue 15602 states that dynamically creating functions is too slow and this allows functions to be hoisted to top-level (like a functor). That rationale wouldn’t apply to local variable existence, and I’m guessing lowering could be changed to differentiate the latter v_re and v_im without performance penalty, but both seem rooted in how the local scope is built during compilation. Another unintuitive implementation is how global variables act like references rather than values in function inputs.

Example of conditional methods not working in a local scope
julia> foofalse = let
         foo() = 0
         if false  foo(x) = 1  end
         foo
       end
(::var"#foo#1") (generic function with 2 methods)

julia> foo() = 0
foo (generic function with 1 method)

julia> if false  foo(x) = 1  end

julia> foo
foo (generic function with 1 method)

julia> if true  foo(x) = 1  end
foo (generic function with 2 methods)