Data copying in Distributed computing

Sheemon7 · August 28, 2020, 7:25pm

I have a perfectly parallelizable task, which I want to compute using several processes (threads are not applicable here). Say I want to pmap a function f, taking about 1-10 seconds over a vector A of 10M elements. The problem is that f requires to read from big read-only structures (say a huge Dict of terms). The question is how to split A into chunks to get the best compromise of balancing the workload and at the same time not communicating and copying data much.

So far, I have created f as a closure over all structs needed and iterated over A in chunks of 1k elements:

vcat(pmap(f, Base.Iterators.partition(A, 1024))

I wonder if there is a better solution for this as it seems that a lot of time is spent with copying the data and communication. Unfortunately, the section in docs is quite brief in that regard.

Are the structures needed for f copied to the target process every time f is called on another partition or is it done just once at the beginning?
Wouldn’t for example an approach of spawning several processes and sending data back and forth through channels more efficient?

Thanks!

Topic		Replies	Views
Understanding message passing with pmap Performance	3	460	June 1, 2022
Distributed Computing without passing large Data Structure General Usage parallel , distributed	2	284	November 18, 2022
Pmap: does it copy or share arguments across processes? Julia at Scale parallel	9	1527	November 26, 2017
Lack of improvement from distributed pmap, understanding a simple example New to Julia distributed , pmap	6	155	October 29, 2024
Pmap, Folds.map and ThreadsX.map perform worse than map for a seemly parallel task Performance question , package , parallel	10	803	November 15, 2021

Data copying in Distributed computing

Related topics