I think I got close to what I wanted using a combination of Threads.@spawn and @sync in the outer loop to wait for the results of each thread before continuing.
for t in 1:10 # Say we want to do the full parallelized process 10 times
results_matrix = zeros(2,2)
@sync for i in 1:2
for j in 1:2
Threads.@spawn begin
results_matrix[i, j] = remote_computation()
end
end
end
end
I do not observe the 4 CPUs at 100% though and still some threads are being used several times for the same parallelized iteration.
PS : I used a real remote computation function that does not use sleep as you suggested.