Hey! I’d like to run 4 independent tasks in parallel for a certain number of times using 4 cores. Here is the script, say test.jl
, that I run with the command julia --threads 4 test.jl
:
using Distributed
function remote_computation()
thid = Threads.threadid()
println("Remote computation with thread $thid")
sleep(5) # Heavy computation: to be parallelized
# (usually I would put some real computation here but for reproducibility I used sleep)
rand()
end
addprocs(Threads.nthreads()) # Add as many processes as available threads, 4 in my case
for t in 1:10 # Say we want to do the full parallelized process 10 times
results_matrix = zeros(2,2)
@sync for i in 1:2
for j in 1:2
@async Threads.@spawn results_matrix[i, j] = remote_computation()
end
end
println("Completed parallelized process number: $t")
end
Notice that I have a nested for loop as, in my real application, remote_computation
depends on i
and j
which explains why I have two levels of iteration. I removed this dependency for the sake of simplicity.
Expectation: My expectation is that for a sequence of 10 (outermost loop), the 4 processes are run in parallel and finish about at the same time. Also, if I do htop
, I expect my 4 CPUs to be 100% busy during the whole execution (notice that in this MWE, using sleep
may not make them busy, right?).
In my understanding, that would give the following output.
Remote computation with thread 2
Remote computation with thread 3
Remote computation with thread 1
Remote computation with thread 4
Completed parallelized process number: 1
Remote computation with thread 1
Remote computation with thread 3
Remote computation with thread 2
Remote computation with thread 4
Completed parallelized process number: 2
etc... (10 times in total)
With the 4 CPUs 100% busy all along.
Reality: However, my expectation is only realized for the 1st of the 10 iterations and then jobs are almost run sequentially (only 2 threads active in parallel) and my cores are ~30% busy with one at 100%. One can see this with the fact that the same thread ID is being repeated starting from iteration 2.
Remote computation with thread 2
Remote computation with thread 3
Remote computation with thread 1
Remote computation with thread 4
Completed parallelized process number: 1
Remote computation with thread 1 # thread 1 used for 3 jobs out of 4
Remote computation with thread 1
Remote computation with thread 1
Remote computation with thread 4
Completed parallelized process number: 2
etc... (10 times in total)
I tried to follow this discussion to use @sync
to have my script wait for the 4 processes are finished and @async
to have them executed in parallel, but apparently I did miss something here.