Understanding `@everywhere` and environments

I am working on a project that I’ve put into a module. The working directory of this module is in my Base.load_path() and running using thvaccine works as expected. I can be in the v1.0 environment and still using thvaccine works.

However, when I try to execute on multiple workers (on local machine), it gives me an error.

julia> @everywhere using thvaccine
ERROR: On worker 2:
ArgumentError: Package thvaccine not found in current path:
- Run `import Pkg; Pkg.add("thvaccine")` to install the thvaccine package.

This is surprising to me since the path is available via Base.load_path() and that the main Julia process has no issues in locating the package. Anyways, the solution to this problem can be found here. Particularly, running

addprocs(4; exeflags="--project")
@everywhere using thvaccine

works perfectly. As far as I know the --project flags is essentially running ] activate . But I am very confused here about how environment works. I thought environments really only matter for the Project.toml and Manifest.toml files. Infact, when I launch a fresh Julia and run using thvaccine in the v1.0 env (since the path is in Base.load_path()), everything works fine.

I rarely ever even type ] activate . except when I want to add a dependency to my project or run unit tests though ].

So why is it that my workers need to be started with the project? How does “activating the environment” have anything to do with parallel workers. I feel like I have less ideal understanding of environments.

I have not been able to solve this very same issue. Even after, using the solution you propose. Do you have any idea on what might be happening?

I do not understand very well the intuition of the problem either.

The worker processes do not execute startup.jl. This is likely why the LOAD_PATH does not point to you module. See the docs:

Note that workers do not run a ~/.julia/config/startup.jl startup script, nor do they synchronize their global state (such as global variables, new method definitions, and loaded modules) with any of the other running processes.

Then, what is the solution for this problem?

I don’t have too much experience with this, but I like to run code on remote machines in new, “clean” directories (no Project.toml). So I can control the environment and don’t have to worry about multiple processes interfering with each others’ environments.

Before starting anything remotely, I run a script that prepares the environment (basically running a couple of Pkg.add or Pkg.develop). When you then activate the prepared environment before running the actual code of interest, everything loads nicely.

This isn’t an elegant or general solution at all but in pinch you can always run

@everywhere push!(LOAD_PATH, "path-to-add")
1 Like