Multithreaded mapping of an iterator (not a collection)

tkf · October 7, 2020, 9:45pm

If ThreadPools.tmap fits your use-case, then I think that’s great! But I’d point out a few caveats:

It looks like ThreadPools.tmap collects input into an array first. It’s a robust strategy but not very optimal. For example, you cannot interleave the iteration and the computation this way.
It pre-computes output element type using the compiler internal. Thus, it is not a typocalypse-free solution. Practically, it means that updating julia can break your code.
The main goal of ThreadPools.jl is to separate latency-critical code from throughput-oriented code. ThreadPools.jl achieves this by a clever trick but unfortunately this impedes dynamic scheduling by the julia runtime. As a result, using this at library level means we will loose composable nested parallelism ecosystem that Julia’s parallel runtime is designed to support.

ThreadsX.map converts the iterator transformations (e.g., Iterators.filter) to transducers and runs the reduction on the “inner most” iterator. So, if you have ThreadsX.map(f, (x for x in xs if p(x))) (or equivalently ThreadsX.map(f, Iterators.filter(p, xs))), what matters is if the iterator (collection) xs supports SplittablesBase.jl API.

Topic		Replies	Views
How to parallelize list comprehension and map with multithreading? General Usage multithreading	20	5681	July 2, 2021
Multi-threading or multi-processing with iterators General Usage parallel , multithreading	3	2019	March 14, 2019
Parallelizing over an iterable Performance question , parallel	18	716	January 29, 2024
Multithreading on Cartesian products General Usage multithreading	22	1548	January 28, 2021
Iterating over an Iterator with map, for loop and threads: Memory usage and performance General Usage question	4	675	March 27, 2023

Multithreaded mapping of an iterator (not a collection)

Related topics