I have a large data structure
g (passed in to my parent function) on which I would like to perform a large number (n = 1e4 to 1e6) of (moderately-expensive) functions with different arguments (
argList). The function results will be completely indepedent of each other, which means that parallelism should be easy. However, I’m struggling with syntax and mechanics. Here’s what I’ve tried / thought:
resultList = pmap ((x) -> myfunction(g, x), argList): this works, but takes ~92 seconds (serial takes ~5 seconds). I believe it’s because
gis being passed to the workers each time, and this is a lot of data transfer.
@parallel (f) for a in argList: I’m not sure how to do this. I can create an
fbut it will take multiple arguments in order to work (it mutates an accumulator vector), and I’m unsure how to pass multiple arguments into a
@parallelfor loop. I have not found a suitable example anywhere.
ParallelDataTransfer.jl: this looks like I’d be able to pass
gonce to each worker, but then I don’t know how to rewrite
pmapso that it uses the worker’s local copy of
Advice / comments appreciated. This is a simplified version of the actual problem; for the real code for option 1) above, please see https://gist.github.com/sbromberger/91dc64cf3c6ff18ef9fb481db0795eed.