Multi-threaded Array building

You could

  1. Use task_local_storage to have a per-task array (rather than per-thread). e.g. No more threadid indexing? [thread-local storage] - #12 by stevengj
  2. Use ChunkSplitters.jl (ala Why has simple threading using `threadid` become so complex in v1.12...? - #28 by foobar_lv2) to create a per-chunk array. (This has the downside of imposing essentially a static parallelization schedule, which might be bad if processing(B[idx]) could take a very different amount of time depending on the argument.)
  3. Use some higher-level primitives like in OhMyThreads.jl or Transducers.jl, e.g. expressing this as a parallel reduction operation.