Overhead of `Threads.@threads`

Our actual code is not really vmap!-able or easily expressed in this form, it’s often more like

for element in some_range
  # do stuff accessing `u[some_set_of_some indices..., element]` and
  # `some_cache[some_indices_again..., element]` 
  # store some result in `du[some_other_indices..., element]`
end

Do you refer to GitHub - JuliaSIMD/ThreadingUtilities.jl: Utilities for low overhead threading in Julia.?