I wanted to ask about an analogue to (multi-threaded) java stream collect Stream (Java Platform SE 8 )
In other words: I define functions
consume!(accumulator, item) and
merge!(leftAccumulator, rightAccumulator), such that
consume!(acc, item) and
merge!(acc, consume!(instantiateAccumulator(), item)) are equivalent, and the thing behaves like a map-reduce with the mapping
item -> consume!(instantiateAccumulator(), item) and the reduction
(left, right) -> merge!(left, right) (and initial / empty value
Context is that allocation and merging of accumulators is often more expensive than consuming an additional item (e.g. allocations!).
Further, I wanted to ask about a variant with multi-threading that supports widely different computation times per item, at modest overhead per item. The API almost forces the implementaton: You need to split the input stream into one chunk per thread, which each gets its own accumulator. When a chunk is done, you need to check whether the previous or next chunk is done (then
merge!) or otherwise steal work from the tail of another chunk (with a new accumulator, the operation need not be commutative). To make this possible, we need to probably unavoidably pay one typically uncontested
atomic_cas! per item (i.e. this is inadequate for very cheap ops, but it does support fat-tailed distributions of
consume! time per item).
Am I simply failing to RTFM tranducers.jl?