Looking for advice on threading

FYI, with Transducers.jl (see Thread- and process-based parallelisms in Transducers.jl (+ some news)), it’s reduce(+, Map(identity), x; basesize=length(x) ÷ nChunks).

I find this kind of approaches limiting as it’s impossible to write a parallel version of sum(f, xs) this way without relying on compiler internal (aka Core.Compiler.return_type).

I think this pattern would invoke false sharing and could be bad for performance.

4 Likes