I intend to solve N mutually independent tasks in parallel and each task is solved on a single CPU. I find an example code online and revised it as in the following MWE.
I was wondering:
- Why there is a for loop over all CPUS as in the line
for p = 1:np
? Will this decrease the speed? - Given the type of parallel tasks I have, what would be the fastest way to do it?
I appreciate any comments.
Here is the MWE:
using Distributed
addprocs(Sys.CPU_THREADS-1)
# A simple function
@everywhere function fun_sleep()
sleep(5)
println("I'm worker number $(myid()), and I reside on machine $(gethostname()).")
return myid()
end
# Parallel computing
function fun_parallel(fun_sleep::Function, num_task::Int64)
temp = Vector(undef, num_task)
np = nprocs()
i = 1
nextidx() = (idx=i; i+=1; idx)
@sync begin
for p = 1:np # what is this for loop doing in here?
if p != myid() || np == 1
@async begin
while true
idx = nextidx()
if idx > num_task
break
end
temp[idx] = remotecall_fetch(fun_sleep, p)
end
end
end
end
end
return temp
end
num_task = 10
@time result = fun_parallel(fun_sleep, num_task)