Packages and workers

pkg
package-manager
distributed

#1

I don’t understand how to import a package that I generated on multiple workers on Julia v1.0.

Steps to reproduce:

  1. Follow the documentation to create the HelloWorld package.
  2. Start Julia with -p 2.
  3. Activate the environment with ] activate .
  4. import HelloWorld.

I get

ERROR: On worker 2:
ArgumentError: Package HelloWorld not found in current path:
- Run `Pkg.add("HelloWorld")` to install the HelloWorld package.

require at ./loading.jl:817
eval at ./boot.jl:319
#116 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:276
run_work_thunk at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:56
run_work_thunk at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:65
#102 at ./task.jl:259
#remotecall_wait#154(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:407
remotecall_wait(::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:398
#remotecall_wait#157(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:419
remotecall_wait(::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/remotecall.jl:419
(::getfield(Distributed, Symbol("##163#165")){Module,Expr})() at ./task.jl:259

...and 2 more exception(s).

Stacktrace:
 [1] sync_end(::Array{Any,1}) at ./task.jl:226
 [2] remotecall_eval(::Module, ::Array{Int64,1}, ::Expr) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/macros.jl:207
 [3] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/macros.jl:190

Same result if I use @everywhere. If I start Julia with only one process the import is fine.
What am I missing?


"@everywhere using" ERROR for worker 2
#2

I am still a total Pkg3 newbie but I got rid of your error (which I reproduced) by doing

using Pkg
Pkg.instantiate()

After that using HelloWorld works.


#3

It doesn’t solve the problem for me. I still get a similar error

**ERROR:** On worker 2:

ArgumentError: Package HelloWorld [cf8bea5c-a91a-11e8-2ba1-4f0c15a7916a] is required but does not seem to be installed:

 - Run `Pkg.instantiate()` to install all recorded dependencies.

When do you call Pkg.instantiate()? After having activated the environment?


#4

This issue explains the problem. activate . is only executed on the master process, so I need a

using Distributed
using HelloWorld   # works
addprocs(3)
@everywhere
  using Pkg; Pkg.activate(".")  # required
  using HelloWorld
end

to get it loaded correctly.

I’d say its a pretty convoluted way of loading a package. Also, there should be some sort of warning or you lose a morning to figure this out.


#5

The documentation does a pretty good job at explaining this.

In order to refer to MyType across all processes, DummyModule.jl needs to be loaded on every process. Calling include("DummyModule.jl") loads it only on a single process. To load it on every process, use the @everywhere macro (starting Julia with julia -p 2 ):


#6

Still, assuming that users would like to use the activated environment (and loaded packages) across workers seems like a reasonable default to me. Are there any technical/usability reasons why this is not the case?


#7

The documentation fails at mentioning the need to activate the environment with @everywhere. As I wrote in the OP the @everywhere import fails


#8

If you activate the environment and restart julia, do you still have to activate it to import on all workers? Since you start julia with workers in a state where the environment is not fully set up, maybe code loading on the workers is affected. Once the environment has been set up, new workers started will perhaps have this same environment?


#9

Yes, because Julia starts in the v1.0 environment by default. I also tried activating the environment first and then launching the worker processes with addprocs(), but I still get a ArgumentError: Package HelloWorld not found in current path: from Worker 2.


Julia environments with multiple workers
#10

I solved this issue in my case by passing the --project flag to addprocs. This will load the project files for the current directory.

Start Julia using julia --project.

using Distributed
addprocs(2; exeflags="--project")

You can also pass the path to the desired environment using --project=/path/to/env. As long as the path in addprocs matches the one used to start Julia it should work fine.


#11

Thanks, this is an interesting workaround.


#12

I seem to be getting the same issue on v1.0 but the Pkg.activate fix doesn’t work. As a test, I’m running…

using Distributed
addprocs(2)
@everywhere push!(LOAD_PATH, “.”)
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using M

…where module M only contains the line println(“Loaded M on $(myid())”). This returns the error

LoadError: On worker 2: Failed to precompile M [top-level]

As with teored90’s issue, it works just fine in serial. When I attempt to add a package via @everywhere using (was testing with “Interpolations”) rather than a user defined module it also fails, only with a directory error instead of a precompile one (this also works in serial). The addprocs(2; exeflags="–project") fix mentioned by ksmcreynolds didn’t seem to have any effect. Has anyone seen / figured out a way around this problem? Thanks!


#13

I just ran into this issue as well. I find the current default unintuitive, but perhaps there are good reasons for it.


#14

There’s an open issue about this here: https://github.com/JuliaLang/julia/issues/28781

The problem AFAIU is that there’s no way to guarantee that the paths for deved packages are portable across machines (which can happen e.g. on a large cluster). Here’s @StefanKarpinski (link)

This requires some careful thinking about how best to do this. One possibility is to send a manifest from the master node to the workers and insist that the manifest be usable on all nodes. That’s fine for non-dev packages and even for dev packages with relative paths but dev packages with absolute paths may be a bit of an issue. We could rewrite dev paths to use a commit instead and then send that over; of course, it requires that the workers know about the tree hash that’s being used…

The recommended work around is suggested by @simonbyrne:

Another work around is to use the JULIA_PROJECT environment variable instead of --project , as that will be passed to subprocesses.