How to use many (non-sticky) tasks while maximizing local storage reuse?

lmiq · June 13, 2023, 4:05pm

The use of Iterators.partition is not ideal for this task, because it may partition in uneven number of tasks per chunk, for example:

julia> length.(Iterators.partition(1:10,4))
3-element Vector{Int64}:
 4
 4
 2

which is why I wrote the (simple but convenient) ChunkSplitters package, which will result in:

julia> using ChunkSplitters

julia> length.(map(first,chunks(1:10,3)))
3-element Vector{Int64}:
 4
 3
 3

Then, I think that if you do not want to spawn many tasks, I prefer the following pattern:

using ChunkSplitters 

function solve(solvers, inputs; number_of_chunks=length(solvers))
    @threads for (i_range, i_chunk) in chunks(inputs, number_of_chunks)
         for i in i_range
            solver = solvers[i_chunk]
            input = inputs[i]
            ...
        end
    end
    # reduce results
    return ...
end

Which has two good properties: 1) It only spawns the number of tasks associated to the number of chunks desired; 2) it has essentially no overhead relative to a simple @threads-ed loop. (You could do the same with @sync and @spawn, but there would be no advantage here. To improve load balancing you can simply increase the number of chunks (or use the :scatter chunking option if there is a correlation between the index of the task and its cost).

If you don’t mind spawning many tasks, you can use channels directly, and there is no need for chunking:

function solve(solvers, input)
    @sync for task in inputs
        @spawn begin
             solver = take!(solvers)
             ...
             put!(solvers, solver)
         end
   end
    # reduce 
    return ...
end

You don’t need chunking because the channels are blocking, the spawned tasks will only run when a channel is available. That is better if the tasks are very uneven and if the time required for spawning tasks is not important relative to the time of each task.

ps: Yet, I think there is something about channels that I don’t get exactly right… When I try experimenting with them I often reach locked states and I don’t understand exactly why… It seems that I can get to a stalled take! call with the pattern above. This was the issue: When using channels, errors do not get raised, and computation stalls.

Topic		Replies	Views
Task/thread-local caches/buffers Performance question , multithreading , concurrency	12	528	July 9, 2023
Add a value to task local storage if it is not already present General Usage question	2	253	November 8, 2022
Calling for people using multithreading heavily, memory fragmentation problem General Usage multithreading , memory , memory-allocation , garbage-collection	1	1093	October 3, 2020
How to run tasks in parallel? General Usage first-steps , multithreading	6	1343	February 22, 2020
Multithreaded memory usage General Usage	19	1495	July 14, 2023

How to use many (non-sticky) tasks while maximizing local storage reuse?

Related topics