You could
- Use
task_local_storage
to have a per-task array (rather than per-thread). e.g. No more threadid indexing? [thread-local storage] - #12 by stevengj - Use ChunkSplitters.jl (ala Why has simple threading using `threadid` become so complex in v1.12...? - #28 by foobar_lv2) to create a per-chunk array. (This has the downside of imposing essentially a static parallelization schedule, which might be bad if
processing(B[idx])
could take a very different amount of time depending on the argument.) - Use some higher-level primitives like in OhMyThreads.jl or Transducers.jl, e.g. expressing this as a parallel reduction operation.