I am having a problem that I think is represented well by this StackExchange question discussion.
As far as I understand that discussion, one needs to be careful when using Distributed to balance two issues:
- Each worker needs to be able to see all of the necessary code.
- If packages need to be pre-compiled, then one could run into a race condition if each worker tries to compile.
I was running into this problem with a more complicated set of code, so I decided to try to make a simple example to understand the issues.
One solution suggested in the StackExchange post is to do something like this:
using Distributed using DataFrames @everywhere using DataFrames
with the idea (as I understand it), that the first
using will result in pre-compilation in serial, and then the
@everywhere using will load the code onto each worker.
I have verified that this works as expected for me.
However, suppose now that instead of
DataFrames, I want to load my very simple module called
# Run.jl module Run function runsims() println("hello!") end end
I start a Julia process with no options and run this code:
using Distributed using Run @everywhere using Run
However, if I start with
julia -p 1 and try to run the same code, I get this error after
ERROR: On worker 2: ArgumentError: Package Run not found in current path: - Run `import Pkg; Pkg.add("Run")` to install the Run package.
Why does this error occur?
If I try this:
using Distributed Pkg.add("Run") using Run @everywhere using Run
then I also get an error:
ERROR: LoadError: The following package names could not be resolved: * Run (not found in project, manifest or registry) Please specify by known `name=uuid`.
Am I just using modules in the wrong way?
I have not had problems with this type of approach until trying to parallelize with Distributed.