The two approaches are mostly equivalent, but the problem with using @threads :static
is that if there’s anything else multithreaded going on elsewhere in your code that you’re not aware of, it’ll destructively interfere.
E.g. if solve!
is actually doing some multithreaded stuff interally, you could end up being reduced to worse than single threaded speeds if you use @threads :static
, but that is not the case with @spawn
.
This should be possible if you combine a channel with the chunking approach as put forth here: Multithreading with shared memory caches - #6 by danielwe
For your case, that’d look something like
function solve2(inputs; number_of_chunks = 20 * nthreads())
solutions = Vector{Solution}(undef, length(inputs))
chunk_size = max(1, length(inputs) ÷ number_of_chunks)
chunks = Iterators.partition(enumerate(inputs), chunk_size)
chunk_queue = Channel{eltype(chunks)}(Inf)
foreach(chunk -> put!(ch, chunk), chunks)
close(chunk_queue)
@sync for _ ∈ 1:nthreads()
@spawn begin
solver = Solver()
for chunk ∈ chunk_queue
for (i, input) in chunk
solutions[i] = solve!(solve)
end
end
end
end
return solutions
end
In this case, we only ever create nthreads()
different tasks, but there’s by default 20
chunks per task, and each task is pulling chunks to work on from the shared Channel
.
Taking from the channel has some overhead due to locks, so we don’t want to do it every iteration, but that’s why we store the work in chunks so that after taking a chunk, we can do a fast sequential loop over the chunk.
There may be a more elegant and more efficient way to write this, I’m not sure, I’m not an experienced user of channels or this specific pattern.
Mostly, we gain composability. One function can do multithreading without worrying if some other function is doing multithreading. We also gain a lot of general flexibility beyond being able to just statically schedule a for loop which is all the old scheduler was really capable of.