How to use many (non-sticky) tasks while maximizing local storage reuse?

The two approaches are mostly equivalent, but the problem with using @threads :static is that if there’s anything else multithreaded going on elsewhere in your code that you’re not aware of, it’ll destructively interfere.

E.g. if solve! is actually doing some multithreaded stuff interally, you could end up being reduced to worse than single threaded speeds if you use @threads :static, but that is not the case with @spawn.

This should be possible if you combine a channel with the chunking approach as put forth here: Multithreading with shared memory caches - #6 by danielwe

For your case, that’d look something like

function solve2(inputs; number_of_chunks = 20 * nthreads())
    solutions = Vector{Solution}(undef, length(inputs))
    
	chunk_size = max(1, length(inputs) ÷ number_of_chunks)
	chunks = Iterators.partition(enumerate(inputs), chunk_size)

    chunk_queue = Channel{eltype(chunks)}(Inf)
    foreach(chunk -> put!(ch, chunk), chunks)
    close(chunk_queue)
    
	@sync for _ ∈ 1:nthreads()
        @spawn begin
            solver = Solver()
            for chunk ∈ chunk_queue
                for (i, input) in chunk
                    solutions[i] = solve!(solve)
                end
            end
        end
	end
	return solutions
end

In this case, we only ever create nthreads() different tasks, but there’s by default 20 chunks per task, and each task is pulling chunks to work on from the shared Channel.

Taking from the channel has some overhead due to locks, so we don’t want to do it every iteration, but that’s why we store the work in chunks so that after taking a chunk, we can do a fast sequential loop over the chunk.

There may be a more elegant and more efficient way to write this, I’m not sure, I’m not an experienced user of channels or this specific pattern.

Mostly, we gain composability. One function can do multithreading without worrying if some other function is doing multithreading. We also gain a lot of general flexibility beyond being able to just statically schedule a for loop which is all the old scheduler was really capable of.

1 Like