Locking in multithreading

Hi,

Here is a piece of code (not a MWE…)

function parallelkernel!(elca,idof,R,...) # fast typestable kernel
    @threads for iel ∈ eachindex(elca)
        r   = incremental(elca[iel],...)
        #lock(lk) do
        R[      idof[iel]] .+= r   # potential BUG race condition
        #end
    end
end

I think the code is quite uncomplicated: within a threaded loop I compute vectors r, which I add into larger vector R.
Several threads can thus be accessing the same elements in R, and in my understanding this is the classical “race condition” which I need to avoid. I have run this code hundreds of times, and always got the correct result, though (famous last words).

I believe I cannot use Atomic here, because r is not a primitive type (right?) but Julia’s manual mentions lock. However if I uncomment the do bracket in the above code I get LoadError: TaskFailedException.

Possibly, I have a subtle bug in my code, not visible in the above simplified code. In that case, I am on my own. What I would like to know is, did I completely misunderstand how lock is supposed to be used? The example of usage in the manual does not show the context of a threaded loop, was I wrong in assuming this usage to be the pattern?

In your piece of code if you uncomment #lock(lk) do and #end, you will get a

ERROR: TaskFailedException
....

    nested task error: UndefVarError: lk not defined

you need to define lk = ReentrantLock() outside of the loop. An MWE:

using .Threads

function sum_race(n)
    arr = Int[]
    @threads for i in 1:n
        push!(arr, i)
    end
    sum(arr)
end

function sum_lock(n)
    arr = Int[]
    lk = ReentrantLock()
    @threads for i in 1:n
        lock(lk) do 
            push!(arr, i)
        end
    end
    sum(arr)
end

The first gives you a race, the second doesn’t:

julia> sum_race(1000)
ERROR: TaskFailedException
....
    nested task error: BoundsError: attempt to access 295-element Vector{Int64} at index [238]
....

julia> sum_lock(1000)
500500

Maybe you defined lk outside your function. Then it maybe something else.

1 Like

Hi Paul

I just missed the part about defining lk in the manual, simple as that!

This solves the issue, thank you!

FYI, reduction using a lock is often a bad idea. With lock, R[idof[iel]] .+= r is not run in parallel. For the code in the OP to be parallelized well, the time takes for R[idof[iel]] .+= r has to be very small relative to incremental(elca[iel],...). If you can keep nthreads copies of R in memory, it may make sense to use solution without the lock. For more information, see:

4 Likes

Hi Takafumi,

Yes, that makes sense. Most CPU time is spent on incremental, so it’s not a disaster, but indeed I can see that more time is spent on this addition than when taking risks without a lock. I actually tried exactly what you suggest. I somehow botched it and got bad performance, but I’ll have another go.

And thank you for the links, I will study them!