And here there is a discussion exactly about load balancing in this context: Parallel load balancing · JuliaNotes.jl
But basically, you can do this:
using ChunkSplitters
function run(;nchunks=Threads.nthreads())
threadcache = [create_cache() for i in 1:nchunks]
@sync for (i_range, i_chunk) in chunks(data, nchunks)
@spawn for i in i_range
data = threadcache[ichunk]
result = compute_stuff(..., data) #data will be mutated in this function
store_results[i] = result #Maybe save the result in a vector
end
end
return store_results
end
and you can increase nchunks
to be of any size, for instance 10 times nthreads()
, to take advantage of the dynamic scheduling while at the same time being thread-safe by not using threadid()