Work array for parallel loop

I’m investigating parallelization of some loops in an older code of mine. The structure is like this:

work_vector = zeros(small_size)
for i = 1 : big_size
    x = big_vector1[i]
    # ...
    # compute y from x using work_vector for scratch space
    # ...
    big_vector2[i] = y
end

I think that if I just insert Threads.@threads in front of the for loop, this will end badly because the threads will collide on the work array. What is the recommended approach?

One way is to create a copy of the work vector for each thread.

This small package can help then
to write the loop: GitHub - m3g/ChunkSplitters.jl: Simple chunk splitters for parallel loop executions

Just to be clear: the solution you are proposing is like this:

Threads.@threads for (irange, _) in chunks(big_vector1, 10, :batch)
    work_vector = zeros(small_size)
    for i in irange
        x = big_vector1[i]
        # compute y from x using work_vector for scratch
        big_vector2[i] = y
    end
end

Yes. That or allocating a vector of work vectors outside the threaded loop (if you want to preserve that allocation).