Hey!
I would like to run a job several times in parallel (as many times as the number of available threads), and I would like to do this several times in a row. MWE:
function mytask() # The task to be parallelized
println(Threads.threadid())
x = 1.0
for _ in 1:1e9
x += rand()
end
end
for i in 1:2 # Say we do two iterations
println("Iteration $i")
@sync for _ in 1:Threads.nthreads()
Threads.@spawn begin
mytask()
end
end
end
Unfortunately, when I do this, say with 4 threads, the first iteration is perfectly parallelized and starting from the second iteration one thread will run the task several times sequentially, yielding the following output:
Iteration 1
1
3
2
4
Iteration 2
1
3
1
1
Thus, iteration 2 takes three times as much time as iteration 1 to be completed. If I draw a sketch of the execution of threads vs time, this can be seen as the following (colored cell means thread is running):
what is this for? I get you want two iterations but is your real task number depend on number of threads? are you studying something related to threading?
btw it just does whatever scheduler thinks is the best:
julia> for i in 1:2 # Say we do two iterations
println("Iteration $i")
@sync for _ in 1:Threads.nthreads()
Threads.@spawn begin
mytask()
end
end
end
Iteration 1
1
4
3
2
Iteration 2
1
4
2
3
unless you’re seeing specific issues with this (e.g. if you have thread-local cache), I’d say this is fine
My real task number does not depend on the number of threads. I am working on Machine Learning and the task is actually some kind of Monte Carlo simulation. In my real application I should run 25 tasks in parallel and I have up to 36 available threads on a remote machine.
Notice that the parallelized tasks are embarrassingly parallel computing: they do not share any memory / data.
You mean that there is some kind of optimization behind? Any idea how to be 100% sure to reproduce the same “perfect parallelization” of iteration 1 in iteration 2?
I could not achieve full parallelism with ThreadPools.tmap or ThreadPools.tforeach, though I succeeded using Threads.@threads for (see below).
for i in 1:2 # Say we do two iterations
println("Iteration $i")
Threads.@threads for _ in 1:4
mytask()
end
end
However, this forces the user to have a single outer for loop encompassing the multithreaded code. Less convenient than @spawn IMO. This would be a problem with nested multithreaded for loops for instance. I’d be interested in achieving fully parallelized code using something like @spawn. I remember being able to do that with apply_async in python, by creating a pool of jobs wherever I wanted in nested for loops and “getting” them afterwards (executing the threads and fetching the results).
Yes I would like this! However my tasks are pretty long and of similar length. I’d be happy just to be able to run the 25 threads in parallel.