FLoops ThreadedEx scheduling

(Maybe aside: Typically, simple tweaking of basesize is the best “very cheap” way to get “good enough” performance. For load-balancing, simply reduce basesize. If the given computation is not big enough increase basesize. I should put this at more prominent location in the manual.)

What did you have in the ...? It has to be sufficiently time-consuming in order to observe the Julia scheduler doing something. If it’s too short, the Julia scheduler may re-use the same worker thread to execute the given tasks.

The main difficulty here is that, if ... is too long, any clever things that FoldsThreads can do will be negligible. This is why [ANN] FoldsThreads.jl: A zoo of pluggable thread-based data-parallel execution mechanisms has a bit more elaborate examples to emphasize the differences of the schedulers.

Another difficulty is interaction with the GC. If the iteration space is not too large (length(xs) below is not too large), I’d simply preallocate a record array and record the thread id used in the loop

record = zeros(Int, length(xs))

@floop ... for (i, x) in enumerate(xs)
   ...
   record[i] = threadid()
end

where the loop body ... should be some non-trivial computation. You’d also need to gather some samples of record data to get a “typical” picture of the execution.

1 Like