Parallel computing, sharing data with all workers

bernhard · May 6, 2019, 11:36am

How can I make some data (e.g. an array or a dataframe) available to all data?
I tried using ParallelDatatrasfer.jl but that is not working (see my issue here https://github.com/ChrisRackauckas/ParallelDataTransfer.jl/issues/16 )

In the example below, assume I want to perform an expensive calculation on vvvec (on serveral workers in parallel).

using Distributed
using ParallelDataTransfer
addprocs(3)
@everywhere using Random

vvvec=[2,3,1]
sendto(workers(),vvvec=vvvec)
@everywhere vvvec.^2

bernhard · May 10, 2019, 5:59am

bump.
Is that a difficult thing to do, or is my question not well formulated?
In my view it is a pretty common use case, e.g. to work on a large piece of data (maybe read from a CSV, or generated by some piece of code) with different parameter settings (maybe model parameters). Thus I often want to share data created on the main worker with all other workers.

jonathanBieler · May 10, 2019, 8:24am

I think you are looking for DistributedArrays:

https://juliaparallel.github.io/DistributedArrays.jl/latest/

using Distributed
addprocs(3)
@everywhere using DistributedArrays, Distributed

@everywhere f(x) = x * myid()
data = distribute([1,1,1,1])

julia> f.(data)
4-element DArray{Int64,1,Array{Int64,1}}:
 2
 2
 3
 4

As you can see the first two elements are processed on worker 2, and the 3 and 4th on worker 3 and 4.

Otherwise there’s also SharedArrays.

platawiec · May 10, 2019, 1:40pm

I have found the interpolation syntax useful for this:
https://docs.julialang.org/en/v1/stdlib/Distributed/index.html#Distributed.@everywhere

foo = 1
@everywhere bar = $foo

so foo can point to data loaded on your main worker, maybe even unique to its filesystem, and then get loaded everywhere else under bar

Topic		Replies	Views
Adding data to worker processes via @everywhere General Usage	12	1200	January 11, 2019
Alternative to SharedArrays for multi-node cluster? General Usage question	9	1129	April 20, 2020
Set a function on master and send it to all workers Julia at Scale	1	697	March 7, 2019
Make Struct available on all workers General Usage question , parallel , distributed	0	319	November 9, 2020
Two level distributed / parallel execution Julia at Scale question	4	1076	April 22, 2020

Parallel computing, sharing data with all workers

Related topics