How to use many (non-sticky) tasks while maximizing local storage reuse?

Mason · June 13, 2023, 6:20pm

The two approaches are mostly equivalent, but the problem with using @threads :static is that if there’s anything else multithreaded going on elsewhere in your code that you’re not aware of, it’ll destructively interfere.

E.g. if solve! is actually doing some multithreaded stuff interally, you could end up being reduced to worse than single threaded speeds if you use @threads :static, but that is not the case with @spawn.

This should be possible if you combine a channel with the chunking approach as put forth here: Multithreading with shared memory caches - #6 by danielwe

For your case, that’d look something like

function solve2(inputs; number_of_chunks = 20 * nthreads())
    solutions = Vector{Solution}(undef, length(inputs))
    
	chunk_size = max(1, length(inputs) ÷ number_of_chunks)
	chunks = Iterators.partition(enumerate(inputs), chunk_size)

    chunk_queue = Channel{eltype(chunks)}(Inf)
    foreach(chunk -> put!(ch, chunk), chunks)
    close(chunk_queue)
    
	@sync for _ ∈ 1:nthreads()
        @spawn begin
            solver = Solver()
            for chunk ∈ chunk_queue
                for (i, input) in chunk
                    solutions[i] = solve!(solve)
                end
            end
        end
	end
	return solutions
end

In this case, we only ever create nthreads() different tasks, but there’s by default 20 chunks per task, and each task is pulling chunks to work on from the shared Channel.

Taking from the channel has some overhead due to locks, so we don’t want to do it every iteration, but that’s why we store the work in chunks so that after taking a chunk, we can do a fast sequential loop over the chunk.

There may be a more elegant and more efficient way to write this, I’m not sure, I’m not an experienced user of channels or this specific pattern.

Mostly, we gain composability. One function can do multithreading without worrying if some other function is doing multithreading. We also gain a lot of general flexibility beyond being able to just statically schedule a for loop which is all the old scheduler was really capable of.

Topic		Replies	Views
Multithreading with shared memory caches Performance question , performance , multithreading	27	1859	June 13, 2023
Behavior of threads General Usage multithreading	33	1699	March 24, 2023
Question on TaskLocalValue Performance multithreading	10	331	March 4, 2024
How to spawn persistent threads and reuse them? Performance parallel , multithreading , sparse	16	260	February 17, 2025
Idiomatic way to create non-sticky tasks? General Usage	17	526	May 3, 2025

How to use many (non-sticky) tasks while maximizing local storage reuse?

Related topics