How to use many (non-sticky) tasks while maximizing local storage reuse?

The use of Iterators.partition is not ideal for this task, because it may partition in uneven number of tasks per chunk, for example:

julia> length.(Iterators.partition(1:10,4))
3-element Vector{Int64}:
 4
 4
 2

which is why I wrote the (simple but convenient) ChunkSplitters package, which will result in:

julia> using ChunkSplitters

julia> length.(map(first,chunks(1:10,3)))
3-element Vector{Int64}:
 4
 3
 3

Then, I think that if you do not want to spawn many tasks, I prefer the following pattern:

using ChunkSplitters 

function solve(solvers, inputs; number_of_chunks=length(solvers))
    @threads for (i_range, i_chunk) in chunks(inputs, number_of_chunks)
         for i in i_range
            solver = solvers[i_chunk]
            input = inputs[i]
            ...
        end
    end
    # reduce results
    return ...
end

Which has two good properties: 1) It only spawns the number of tasks associated to the number of chunks desired; 2) it has essentially no overhead relative to a simple @threads-ed loop. (You could do the same with @sync and @spawn, but there would be no advantage here. To improve load balancing you can simply increase the number of chunks (or use the :scatter chunking option if there is a correlation between the index of the task and its cost).

If you don’t mind spawning many tasks, you can use channels directly, and there is no need for chunking:

function solve(solvers, input)
    @sync for task in inputs
        @spawn begin
             solver = take!(solvers)
             ...
             put!(solvers, solver)
         end
   end
    # reduce 
    return ...
end

You don’t need chunking because the channels are blocking, the spawned tasks will only run when a channel is available. That is better if the tasks are very uneven and if the time required for spawning tasks is not important relative to the time of each task.

ps: Yet, I think there is something about channels that I don’t get exactly right… When I try experimenting with them I often reach locked states and I don’t understand exactly why… It seems that I can get to a stalled take! call with the pattern above. This was the issue: When using channels, errors do not get raised, and computation stalls.

1 Like