This is a toy example of the problem. I want to sum a bunch of number is parallel, using a lock to reduce.
Disconsidering the fact that the lock here is of course detrimental to performance (the actual use case is different), is this the correct way to use one? Why is the output type not inferred? Anything I can do about it?
julia> function test(x)
s = zero(eltype(x))
lk = ReentrantLock()
@sync for i in 1:1000
Threads.@spawn begin
spart = sum(@view(x[8*(i-1)+1:8*i]))
lock(lk) do
s += spart
end
end
end
return s
end
test (generic function with 1 method)
julia> @code_warntype test(ones(Int,8000))
MethodInstance for test(::Vector{Int64})
from test(x) in Main at REPL[11]:1
Arguments
#self#::Core.Const(test)
x::Vector{Int64}
Locals
lk::ReentrantLock
s@_4::Core.Box
@_5::Union{Nothing, Tuple{Int64, Int64}}
v::Nothing
sync#41::Channel{Any}
i::Int64
#15::var"#15#17"{Vector{Int64}, ReentrantLock, Int64}
task::Task
s@_11::Union{}
Body::Any
1 ─ (s@_4 = Core.Box())
│ %2 = Main.eltype(x)::Core.Const(Int64)
│ %3 = Main.zero(%2)::Core.Const(0)
From what I understand https://github.com/JuliaLang/julia/pull/41449 is a prerequisite for inferring @spawn (which is not in 1.7). And even with this, further work is needed to make it happen.
In this case however I believe the problem is that Threads.@spawn as well as the lock(ok) do ... pattern create a closure. Since you assign to s from both outside as well as inside the closure, you are running into
In the real case s is a complicated structure, as I understand that would not apply.
But the problem is independent of the lock (although of course there the result is wrong):
julia> function test(x)
s = zero(eltype(x))
@sync for i in 1:1000
Threads.@spawn begin
spart = sum(@view(x[8*(i-1)+1:8*i]))
s += spart
end
end
return s
end
test (generic function with 1 method)
julia> @code_warntype test(ones(Int,8000))
MethodInstance for test(::Vector{Int64})
from test(x) in Main at REPL[13]:1
Arguments
#self#::Core.Const(test)
x::Vector{Int64}
Locals
s@_3::Core.Box
@_4::Union{Nothing, Tuple{Int64, Int64}}
v::Nothing
sync#41::Channel{Any}
i::Int64
#3::var"#3#4"{Vector{Int64}, Int64}
task::Task
s@_10::Union{}
Body::Any
1 ─ (s@_3 = Core.Box())
As I explained above, Threads.@spawn also creates a closure. For more complicated structures, you might want to have a look at the newly added per-field atomics. If you insist on using the explicit lock pattern here though, you could also just make s a Ref.
Uhm… that is an idea, didn’t think about that one. (nothing against the new per-field atomic, just not sure if supporting only 1.7 is a good idea now). I am experimenting with some alternatives for now, thanks for the hint.