How to preallocate for a parallel monte carlo simulation?

Had a possibly similar need before, and ended up using FLoops.jl which I can recommend. Maybe see also my own question a while ago.

I use the following pattern, maybe it’s useful in your case? I think your work_vector could be varexternal below.

using FLoops

ex = ThreadedEx() # or SequentialEx()

@floop ex for i = 1:nparticles
    @init ve = deepcopy(varexternal)

    # compute something by f, potentially using external variables ve
    # (each thread base has its "own" ve; ve can be mutated in-place)
    out[i] = f(ve)
end