Hi all, I’m running into the problem of having inconsistent outputs when using the Threads.@threads macro on a loop and I struggle to understand why it’s happening.
Here is a MWE function
function coefs(N)
v = zeros(ComplexF64, N)
Threads.@threads for k in 1:N-1
v_re = 0.5*k
v_im = -0.5*k
v[k+1] = v_re + 1im*v_im
end
v_re = 0.25
v_im = -0.25
v[1] = v_re + 1im*v_im
return v
end
When run on a single thread, the output is always correct, e.g.
I understand that Threads.@threads can run into data races if one is not careful, but I really don’t see why that is happening here. The only thing all threads are accessing is the v vector and just to write the output to it. Is this operation not safe?
Disclaimer: I am not an expert in Threads.@threads or friends.
Applying any of the changes below makes it work correctly on v1.9.1:
add a local in front of v_re, v_im inside the loop body
rename v_re, v_im inside the loop body
rename or remove v_re, v_im outside the loop
Someone correct me, but this smells like a bug in Threads.@threads.
EDIT: As clarified in the comments below, this is intended but also bad behavior. What is happening is that by the current scoping rules, using the names v_re, v_im also after the loop body makes them being shared between threads, which causes a data race.
Here is the fixed version:
function coefs_w_threads(N)
v = zeros(ComplexF64, N)
Threads.@threads for k in 1:N-1
local v_re = 0.5*k
local v_im = -0.5*k
v[k+1] = v_re + 1im*v_im
end
v_re = 0.25
v_im = -0.25
v[1] = v_re + 1im*v_im
return v
end
function coefs(N)
v = zeros(ComplexF64, N)
for k in 1:N-1
v_re = 0.5*k
v_im = -0.5*k
v[k+1] = v_re + 1im*v_im
end
v_re = 0.25
v_im = -0.25
v[1] = v_re + 1im*v_im
return v
end
ref = coefs(1000)
threads = coefs_w_threads(1000)
# display(sum(ref) == sum(threads))
display(all(ref .== threads)) # more reliable test
EDIT: I had added the local to the coefs version instead of coefs_w_threads …
So it seems like the v_re and v_im variables are not local to the iterations, but local to the overall function because they are assigned after the for loop. Which is contrary to my expectations, I expected the subsequent assignments to make entirely separate variables from ones local to the iterations. This behavior is on v1.8, can someone check if this also happened in earlier versions?
julia> let
for i in 1:3 k = i+1 end # k belongs to iteration scope
println(k)
end
ERROR: UndefVarError: k not defined
Stacktrace:
[1] top-level scope
@ ./REPL[1]:5
julia> let
for i in 1:3 k = i+1 end
println(k)
local k # k belongs to let scope with declaration or assignment
end
4
Then I suppose I remembered incorrectly and this was always how Julia behaved. Maybe the docs on the scoping rules can be more clear, a natural interpretation of “If x is already a local variable , then the existing local x is assigned” is using line order to determine what variable exists by a given line, but the whole local scope really exists at once.
Indeed, actually I think this is consistent with the scoping rules, but catchy in this example. If one had defined v[1] before the loop, the problem would be evident.
I think this issue is unrelated to the @threads problem observed above: Inside the loop body the values of v_re, v_im are always computed independent from 1) any other iteration and 2) from what is computed after the loop.
Okay, this is why I may have been confused before. If the local variable exists but is not assigned a value by a line, it is considered not defined at that line.
julia> let
println(k)
k = 0
end
ERROR: UndefVarError: k not defined
Stacktrace:
[1] top-level scope
@ REPL[8]:2
julia> let
local k
println(k)
end
ERROR: UndefVarError: k not defined
Stacktrace:
[1] top-level scope
@ REPL[9]:3
I’m not really sure myself, but if data races occur because threads are writing to shared memory like v, then I would expect data races to be possible with variables local to the function, as the iterations would share those.
No, not really, it is not related to the scheduler. That is equivalent to:
julia> function foo(N)
x = [ (0,0) for _ in 1:N ]
a, b = (0,0)
Threads.@threads for i in eachindex(x)
a, b = (i,i)
x[i] = (a,b)
end
return sum(el -> el[1], x) == N*(N+1)/2
end
foo (generic function with 1 method)
julia> foo(10^5)
false
The a and b variables are being modified concurrently by each thread, and may get updated wrongly. The fact that they are defined before or after the loop doesn’t mater, because they are locals of the function in both cases (and that is what is confusing, if ones assigns the label after the loop).
Since the time from the update of a and b and their use is very short, the concurrency may not be perceived, but it can happen (and does) here.
I get what’s happening now. With that said, though I haven’t gone through the scope docs in a bit, for sure I personally think there should be some kind of warning/mention of this behaviour at least in the multithreading docs. Although it makes sense once you know what’s happening, it is definitely quite counter-intuitive behaviour that seems to have the potential of easily catching people off guard.
It feels like the same realm as not being able to conditionally define methods for functions in local scopes, every method just exists at once. Issue 15602 states that dynamically creating functions is too slow and this allows functions to be hoisted to top-level (like a functor). That rationale wouldn’t apply to local variable existence, and I’m guessing lowering could be changed to differentiate the latter v_re and v_im without performance penalty, but both seem rooted in how the local scope is built during compilation. Another unintuitive implementation is how global variables act like references rather than values in function inputs.
Example of conditional methods not working in a local scope
julia> foofalse = let
foo() = 0
if false foo(x) = 1 end
foo
end
(::var"#foo#1") (generic function with 2 methods)
julia> foo() = 0
foo (generic function with 1 method)
julia> if false foo(x) = 1 end
julia> foo
foo (generic function with 1 method)
julia> if true foo(x) = 1 end
foo (generic function with 2 methods)