Writing to a shared buffer in multi-threaded for loop

TaskLocalValues.jl are just a thin wrapper around Base.task_local_storage, which provides more flexibility. Using the latter, when you create the local buffer for the first time in a given task, you could store it in a thread-global array (protected by a lock) to later reduce over, i.e. something like:

buf = get!(task_local_storage(), :MY_BUFFER_SYMBOL) do
    # create new buffer if it doesn't exist yet for this task
    lock(my_buffers_lock) do
        newbuf = create_new_buffer()
        push!(my_buffers, newbuf)
        newbuf
    end
end::SomeBufferType

Alternatively, you can replace your threaded loop with a loop over equal-sized chunks of the array, via ChunkSplitters.jl, and allocate one buffer per chunk, as described here: No more threadid indexing? [thread-local storage] - #13 by lmiq — the tradeoff is that this pushes more load-balancing responsibility onto you.