If ThreadPools.tmap
fits your use-case, then I think that’s great! But I’d point out a few caveats:
- It looks like
ThreadPools.tmap
collects input into an array first. It’s a robust strategy but not very optimal. For example, you cannot interleave the iteration and the computation this way. - It pre-computes output element type using the compiler internal. Thus, it is not a typocalypse-free solution. Practically, it means that updating
julia
can break your code. - The main goal of ThreadPools.jl is to separate latency-critical code from throughput-oriented code. ThreadPools.jl achieves this by a clever trick but unfortunately this impedes dynamic scheduling by the
julia
runtime. As a result, using this at library level means we will loose composable nested parallelism ecosystem that Julia’s parallel runtime is designed to support.
ThreadsX.map
converts the iterator transformations (e.g., Iterators.filter
) to transducers and runs the reduction on the “inner most” iterator. So, if you have ThreadsX.map(f, (x for x in xs if p(x)))
(or equivalently ThreadsX.map(f, Iterators.filter(p, xs))
), what matters is if the iterator (collection) xs
supports SplittablesBase.jl API.