I have a “trivially parallel” computation to run. The function is loosely like this:
function process_one_iteration(i::Int64, input_directory::String)
x = read_input_from_directory(i, input_directory::String)
#E.g. reads input_directory/Input-1, Input-2, etc.
y = do_something_intensive_to(x) #Takes about 10 minutes per core
return y
end
Right now I handle this by doing
addprocs(100) #say 100 or so
all_inputs = collect(1:1000) #or so
all_results = pmap( i -> process_one_iteration(i, input_directory), all_inputs)
This works fine and in the above example (1000 inputs, 100 cores, 10 minutes per input), takes approximately 10 * 10 = 100 minutes. Fine.
But this limits me to using exactly n
in addprocs(n)
cores, when it’s possible the HPC has additional resources available. What I would like to do is change pmap to a function like this:
function pmap_dynamically(...)
Start with workers already allocated
Kick off all available jobs to those workers
Once a worker finishes, *don't* send it a new job right away. Rather,
Check the resources on the cluster
If there are sufficient unused resources,
addprocs(a reasonable # of workers)
If resources are running very tight,
rmprocs(the recently finished worker, and/or a reasonable # of other idle workers)
End
*Then* resume allocating jobs to workers as per usual (until one finishes and this scaling repeats).
end
I have already written the functions that check resources, so this part is fine.
What I can’t quite figure out is how to make pmap play nice with this workflow. I looked at the source code for pmap
and asyncmap
but am stymied by my ignorance of what this is really doing. It seems though that it is assuming a constant resource pool…?
Since the per-iteration function takes about 10 minutes, this resource pool could be adjusted relatively safely (the code is not going to try to add / rm procs multiple times per minute, let alone second).
Is this doable?