Multithreading with dynamic scheduler

lucas711642 · August 19, 2020, 12:25am

I have an expensive function that I need to evaluate for several inputs. To use multithreading, I usually use the following workflow:

inputs = collect(deepcopy(input) for _ = 1 : Threads.nthreads())
Threads.@threads for k in iterable
	id = Threads.threadid()
	change_input!(inputs[id], k)
	expensive_function(inputs[id])
	restore_input!(inputs[id], k)
end

Since the time per iteration can vary, I want to use a dynamic scheduler with @spawn. To do so, I must ensure that a specific thread must not switch tasks before finishing the entire loop iteration. How can I do that?

It seems to me that I cannot do this without overwriting the wait function. Is that correct?

Thanks in advance

lungben · August 19, 2020, 6:25am

Could you refactor your code to use a pure function for parallel execution?
Then you could do

f(x) = x^2 # some expensive (pure) function
inputs = 1:10
futures = [Threads.@spawn f(x) for x in inputs]
# maybe more code here which does not depend on results
results = fetch.(futures)

biona001 · August 19, 2020, 6:36am

You might also consider using ThreadPools.jl which exposes the @qthreads macro for a dynamically scheduled for loop

GunnarFarneback · August 19, 2020, 2:27pm

Maybe something like

function worker(input, channel)
    while true
        k = take!(channel)
        isnothing(k) && break
        change_input!(input, k)
        expensive_function(input)
        restore_input!(input, k)
    end
end

channel = Channel(N)
workers = []
for i = 1:N
    push!(workers, @spawn worker(deepcopy(input), channel))
end

for k in iterable
    put!(channel, k)
end

for i = 1:N
    put!(channel, nothing)
end

for worker in workers
    wait(worker)
end

lucas711642 · August 19, 2020, 7:34pm

I thought about calling input_copy = deepcopy(input) inside the for loop and then calling the pure function expensive_function(input_copy). The problem is that input occupies a lot of memory, so I would rather minimize the number of calls to deepcopy.

lucas711642 · August 19, 2020, 7:35pm

I didn’t know about ThreadPools. Thanks!

lucas711642 · August 19, 2020, 7:37pm

Actually, I think I cannot guarantee that each worker is assigned to each thread. Maybe this is something where I can use the @tspawnat macro of the ThreadPools package suggested by @biona001.

lungben · August 19, 2020, 8:01pm

If your expensive_function does not modify input (and only returns its output), there is no need to copy it.
If it does need to modify the input, maybe there is a way to split input into separate elements (e.g. columns in a matrix) where each instance of expensive_function only operates on a distinct part of it? Then a (distinct) view on input could be passed to each instance of the expensive_function.

GunnarFarneback · August 19, 2020, 8:07pm

Why do you need that guarantee?

lucas711642 · August 19, 2020, 8:34pm

For each iteration, I initially modify some of the elements of input (a struct basically composed of Vector{Float64} fields) and then I call expensive_function. Indeed, I could split input into two different variables where only one of them is modified in each iteration. The only problem is that it would demand much work to adapt expensive_function for this case.

lucas711642 · August 19, 2020, 8:35pm

Indeed, I didn’t pay attention to the fact that in your code I have each deepcopy(input) binded to a specific task, rather than binded to a specific thread.

I’ll definitely test your solution. Thank you!

Topic		Replies	Views
Multi-threaded code with :dynamic scheduler and mutable data General Usage multithreading	5	921	April 26, 2023
Multithreading - schedule tasks with dependencies? General Usage multithreading	2	299	July 25, 2023
Multithreading workflow - alternative for :static scheduling General Usage multithreading	2	391	September 15, 2022
Multi-threading appears to be single thread when some threads cost much more time than the others? General Usage question	13	718	July 10, 2023
Is static scheduling available in Julia multi-threading? General Usage question	0	304	February 21, 2023

Multithreading with dynamic scheduler

Related topics