How to implement multi-threading with external in-place mutable variables?

mlanghinrichs · June 9, 2021, 7:55am

Hey,
I’m wondering how to implement a method with multi-threading that also allows for mutating external variables in-place. The idea is that each thread/base requires their “own” external variable for data race free mutations.

One solution I came up with is based on FLoops.jl (down below). The @floop macro can run the for-loop in single or multiple threads (ex=SequentialEx() or ex=ThreadedEx(), respectively). The @init macro, together with a deepcopy(), should handle data race free mutating variables. Does anyone understand what I want and is the code correct and safe?

using FLoops

# method to emulate some "runtime"
sleep2 = t -> (b=time(); while b+t > time() end)

function FL_ex3!(y, varexternal, niter; ex=ThreadedEx())
   @floop ex for i in 1:niter
       @init c = deepcopy(varexternal)
       sleep2(0.25)
       c .= (1.0*i, 2.0*i, 3.0*i)
       y[i, :] .= c
   end
   y
end

### sequential and multi-threading give the same output (as a check)
FL_ex3!(zeros(10, 3), zeros(3), 10, ex=SequentialEx()) == FL_ex3!(zeros(10, 3), zeros(3), 10)
# true

### compare performance multi-threaded vs. single-threaded
Threads.nthreads() # 10

using BenchmarkTools
@btime FL_ex3!(zeros(10, 3), zeros(3), 10)
# 250.124 ms (150 allocations: 10.45 KiB)

@btime FL_ex3!(zeros(10, 3), zeros(3), 10, ex=SequentialEx())
# 2.500 s (7 allocations: 960 bytes)

Multi-threading is a bit new to me, so sorry if the terminology is a bit off. Would be very glad for input!

Skoffer · June 9, 2021, 8:46am

Just in case, have you seen this tutorial: Tutorial: Efficient and safe approaches to mutation in data parallelism ?

mlanghinrichs · June 9, 2021, 10:39am

This is a great resource thanks!
There are two points that are still a bit unclear for me:

In the tutorial and docs, the @init macro is always used in combination with @floop and the @reduce macro. I’m wondering if it also works when being used with @floop alone as for my function.
The parallel pattern I use in my function above is of the kind “filling pre-allocated output” as described in the tutorial (output refers to y in my case). It says that this pattern may be unsafe for views, BitArrays, SparseMatrixCSC/SparseVector, Dict and potentially more. However it indicates that this pattern is data race free for Arrays. I’m wondering if this is true for any array that contains some custom type such as y = Vector{SomeType}(), when this vector is indexed/mutated on the most-outer level (y[i]).

Glad for any help!

tkf · June 11, 2021, 8:11pm

Yes, @init works for this purpose. It looks safe to me.

It’s data race-free if SomeType and its components are immutable or union of immutable types (isbitstype or Base.isbitsunion). If SomeType is mutable, it’s slightly tricky. Consider:

x1 = Ref(0)
xs = [x1, x1]

Now, concurrent update of the memory location of x1 (e.g., xs[1][] = 1 in task 1 and xs[2][] = 2 in task 2) is a data race even though they use different indices for xs. On the other hand, updating xs[i] itself (e.g., xs[1] = Ref(1) in task 1 and xs[2] = Ref(2) in task 2) is not a data race.

mlanghinrichs · June 14, 2021, 11:30am

Thanks a lot, extremely helpful!! My SomeType is a (immutable) struct with tuple as fields, so this should be covered by your first case.