I’m trying to optimize some code running in parallel. I’m struggling to figure out the best way to lay out this approach, and where to use pmap(), sync, async, distributed etc.
I am applying a null model randomization to each row of my data. The out is written to a separate file, so the order in which the highest level operation is done doesn’t matter. Within each row, I simulate the null model a large number of times (here nsim = 100000), and each simulation is relatively quick, though it can take some time overall, which varies depending on the specific row being randomized.
Here is some mock code that gets the idea across:
using Distributed
addprocs(72)
@everywhere data
@everywhere nsim = 100000
@everywhere function run_null_model(my_data)
# function for randomizing the distribution, and summarizing
end
@everywhere run_simulation(i)
my_data = data[i,:]
# Additional manipulation of my_data takes place here.
# This is somewhere computationally intensive, with the time
# varying a lot across rows.
results = zeros(Float64, size(my_data)[2])
# these simulations do not need to be conducted in order. they are random draws
for j = 1:nsim
results[j] = run_null_model(my_data)
end
CSV.write(string("results_",i,".csv"),results)
return nothing
end
# run code in parallel
pmap(run_simulation, 1:(size(data)[1]))
Right now I am using pmap on the outermost function. I’ve also explored @sync @distributed
, but I don’t really need to sync these up, as each process writes its own file (but maybe I’m misunderstanding the sync flag). And I cannot for the life of me figure out how I might be able to do a nested sync/async
pair in here.
I’ve also tried nesting another parallel call when running the simulations (e.g., @sync @distributed for j = 1:nsim
) but this doesn’t seem to provide any code speedup.
Are there any obvious changes that can speed things up? Is there anything I’m doing “wrong” here?
EDIT: Just to add, the computation time for these operations are not consistent for each row of my data
. That is, some rows take a second to complete the whole run_simulation
block, some take 60 seconds, depending on the attributes of that specific row. My goal is to minimize downtime, hence putting the pmap at the outermost level.