Multithreaded mapreduce / collect

Hi,
I wanted to ask about an analogue to (multi-threaded) java stream collect Stream (Java Platform SE 8 )

In other words: I define functions instantiateAccumulator(), consume!(accumulator, item) and merge!(leftAccumulator, rightAccumulator), such that
consume!(acc, item) and merge!(acc, consume!(instantiateAccumulator(), item)) are equivalent, and the thing behaves like a map-reduce with the mapping item -> consume!(instantiateAccumulator(), item) and the reduction (left, right) -> merge!(left, right) (and initial / empty value instantiateAccumulator()).

Context is that allocation and merging of accumulators is often more expensive than consuming an additional item (e.g. allocations!).

Further, I wanted to ask about a variant with multi-threading that supports widely different computation times per item, at modest overhead per item. The API almost forces the implementaton: You need to split the input stream into one chunk per thread, which each gets its own accumulator. When a chunk is done, you need to check whether the previous or next chunk is done (then merge!) or otherwise steal work from the tail of another chunk (with a new accumulator, the operation need not be commutative). To make this possible, we need to probably unavoidably pay one typically uncontested atomic_cas! per item (i.e. this is inadequate for very cheap ops, but it does support fat-tailed distributions of consume! time per item).

Am I simply failing to RTFM tranducers.jl?