Had a possibly similar need before, and ended up using FLoops.jl which I can recommend. Maybe see also my own question a while ago.
I use the following pattern, maybe it’s useful in your case? I think your work_vector
could be varexternal
below.
using FLoops
ex = ThreadedEx() # or SequentialEx()
@floop ex for i = 1:nparticles
@init ve = deepcopy(varexternal)
# compute something by f, potentially using external variables ve
# (each thread base has its "own" ve; ve can be mutated in-place)
out[i] = f(ve)
end