Parallel computing, sharing data with all workers

How can I make some data (e.g. an array or a dataframe) available to all data?
I tried using ParallelDatatrasfer.jl but that is not working (see my issue here https://github.com/ChrisRackauckas/ParallelDataTransfer.jl/issues/16 )

In the example below, assume I want to perform an expensive calculation on vvvec (on serveral workers in parallel).

using Distributed
using ParallelDataTransfer
addprocs(3)
@everywhere using Random

vvvec=[2,3,1]
sendto(workers(),vvvec=vvvec)
@everywhere vvvec.^2

bump.
Is that a difficult thing to do, or is my question not well formulated?
In my view it is a pretty common use case, e.g. to work on a large piece of data (maybe read from a CSV, or generated by some piece of code) with different parameter settings (maybe model parameters). Thus I often want to share data created on the main worker with all other workers.

I think you are looking for DistributedArrays:

https://juliaparallel.github.io/DistributedArrays.jl/latest/

using Distributed
addprocs(3)
@everywhere using DistributedArrays, Distributed

@everywhere f(x) = x * myid()
data = distribute([1,1,1,1])

julia> f.(data)
4-element DArray{Int64,1,Array{Int64,1}}:
 2
 2
 3
 4

As you can see the first two elements are processed on worker 2, and the 3 and 4th on worker 3 and 4.

Otherwise there’s also SharedArrays.

I have found the interpolation syntax useful for this:
https://docs.julialang.org/en/v1/stdlib/Distributed/index.html#Distributed.@everywhere

foo = 1
@everywhere bar = $foo

so foo can point to data loaded on your main worker, maybe even unique to its filesystem, and then get loaded everywhere else under bar

2 Likes