Distributed, packages, and race condition in pre-compiling

I am having a problem that I think is represented well by this StackExchange question discussion.
As far as I understand that discussion, one needs to be careful when using Distributed to balance two issues:

  1. Each worker needs to be able to see all of the necessary code.
  2. If packages need to be pre-compiled, then one could run into a race condition if each worker tries to compile.

I was running into this problem with a more complicated set of code, so I decided to try to make a simple example to understand the issues.

One solution suggested in the StackExchange post is to do something like this:

using Distributed
using DataFrames
@everywhere using DataFrames

with the idea (as I understand it), that the first using will result in pre-compilation in serial, and then the @everywhere using will load the code onto each worker.
I have verified that this works as expected for me.

However, suppose now that instead of DataFrames, I want to load my very simple module called Run:

# Run.jl
module Run

function runsims()


I start a Julia process with no options and run this code:

using Distributed
using Run
@everywhere using Run

No problems.

However, if I start with julia -p 1 and try to run the same code, I get this error after using Run:

ERROR: On worker 2:
ArgumentError: Package Run not found in current path:
- Run `import Pkg; Pkg.add("Run")` to install the Run package.

Why does this error occur?
If I try this:

using Distributed
using Run
@everywhere using Run

then I also get an error:

ERROR: LoadError: The following package names could not be resolved:
 * Run (not found in project, manifest or registry)
Please specify by known `name=uuid`.

Am I just using modules in the wrong way?
I have not had problems with this type of approach until trying to parallelize with Distributed.

1 Like

Forgot something: All of this is premised on me having added the current working directory to LOAD_PATH:

push!(LOAD_PATH, ".")