How can I make some data (e.g. an array or a dataframe) available to all data?
I tried using ParallelDatatrasfer.jl but that is not working (see my issue here https://github.com/ChrisRackauckas/ParallelDataTransfer.jl/issues/16 )
In the example below, assume I want to perform an expensive calculation on vvvec (on serveral workers in parallel).
@everywhere using Random
Is that a difficult thing to do, or is my question not well formulated?
In my view it is a pretty common use case, e.g. to work on a large piece of data (maybe read from a CSV, or generated by some piece of code) with different parameter settings (maybe model parameters). Thus I often want to share data created on the main worker with all other workers.
I think you are looking for
@everywhere using DistributedArrays, Distributed
@everywhere f(x) = x * myid()
data = distribute([1,1,1,1])
As you can see the first two elements are processed on worker 2, and the 3 and 4th on worker 3 and 4.
Otherwise there’s also
I have found the interpolation syntax useful for this:
foo = 1
@everywhere bar = $foo
foo can point to data loaded on your main worker, maybe even unique to its filesystem, and then get loaded everywhere else under