Hi all!
I am writing some code where I need to call a function repeatedly, which is executed in a distributed fashion. Below you find some pseudocode to illustrate the problem. In outer_function
I followed the suggestion in the manual (Shared Arrays Manual).
My Question: The profiling result suggests, that @async
always needs runtime dispatch. Is there a way to get rid of the runtime dispatch?
Comments:
- Running the entire code in parallel might be difficult because there are some changes which need to be done in the main outer loop before
outer_function
can be called. - I tried to get rid of the distributed part and just use multithreading, but the distributed solution is about twice as fast.
using Distributed
addprocs(15)
@everywhere using DistributedArrays, SharedArrays
function inner_function(result, parameters, data)
local_ind = DistributedArrays.localpartindex(data)
data_local = DistributedArrays.localpart(data)
inner_result = 0
# Some more code, which includes a threaded for loop
# and which will update inner_result
result[cind] = inner_result
nothing
end
function outer_function(parameters::Vector{F}, data) where F
# some more code here where something is done with parameters
results = SharedArray{F, 1}((length(procs(data))), init=zero(F), pids=vec(procs(data)))
@sync begin
for j in procs(data)
@async remotecall_wait(inner_function, j, results, parameters, data)
end
end
# in the actual code this is a bit more elaborate
return_value = sum(results)
return return_value
end
# prepare the data
data_raw = rand(10, 1000, 15)
data = distribute(data_raw, procs=workers(), dist=[1, 1, nworkers()]
parameters = rand(1000)
n_steps = 1000
# run the main loop
for step in 1:n_steps
# do something to the parameters
# ALL WORKERS NEED TO WAIT FOR THIS UPDATE TO BE DONE
tmp_val = outer_function(parameters, data)
# write tmp_val
end
Thank you so much for your help and suggestions!