TaskLocalValues.jl are just a thin wrapper around Base.task_local_storage, which provides more flexibility. Using the latter, when you create the local buffer for the first time in a given task, you could store it in a thread-global array (protected by a lock) to later reduce over, i.e. something like:
buf = get!(task_local_storage(), :MY_BUFFER_SYMBOL) do
# create new buffer if it doesn't exist yet for this task
lock(my_buffers_lock) do
newbuf = create_new_buffer()
push!(my_buffers, newbuf)
newbuf
end
end::SomeBufferType
Alternatively, you can replace your threaded loop with a loop over equal-sized chunks of the array, via ChunkSplitters.jl, and allocate one buffer per chunk, as described here: No more threadid indexing? [thread-local storage] - #13 by lmiq — the tradeoff is that this pushes more load-balancing responsibility onto you.