If ThreadPools.tmap fits your use-case, then I think that’s great!  But I’d point out a few caveats:
- It looks like 
ThreadPools.tmapcollects input into an array first. It’s a robust strategy but not very optimal. For example, you cannot interleave the iteration and the computation this way. - It pre-computes output element type using the compiler internal. Thus, it is not a typocalypse-free solution. Practically, it means that updating 
juliacan break your code. - The main goal of ThreadPools.jl is to separate latency-critical code from throughput-oriented code. ThreadPools.jl achieves this by a clever trick but unfortunately this impedes dynamic scheduling by the 
juliaruntime. As a result, using this at library level means we will loose composable nested parallelism ecosystem that Julia’s parallel runtime is designed to support. 
ThreadsX.map converts the iterator transformations (e.g., Iterators.filter) to transducers and runs the reduction on the “inner most” iterator. So, if you have ThreadsX.map(f, (x for x in xs if p(x))) (or equivalently ThreadsX.map(f, Iterators.filter(p, xs))), what matters is if the iterator (collection) xs supports SplittablesBase.jl API.