Data copying in Distributed computing

I have a perfectly parallelizable task, which I want to compute using several processes (threads are not applicable here). Say I want to pmap a function f, taking about 1-10 seconds over a vector A of 10M elements. The problem is that f requires to read from big read-only structures (say a huge Dict of terms). The question is how to split A into chunks to get the best compromise of balancing the workload and at the same time not communicating and copying data much.

So far, I have created f as a closure over all structs needed and iterated over A in chunks of 1k elements:

vcat(pmap(f, Base.Iterators.partition(A, 1024))

I wonder if there is a better solution for this as it seems that a lot of time is spent with copying the data and communication. Unfortunately, the section in docs is quite brief in that regard.

  • Are the structures needed for f copied to the target process every time f is called on another partition or is it done just once at the beginning?
  • Wouldn’t for example an approach of spawning several processes and sending data back and forth through channels more efficient?

Thanks!

1 Like