Distributed parallelism within packages/applications

How can I use parallelism (on multiple processes) in a julia application/package?
I have an application (in the Pkg3 sense that is not a package) in which I use pmap.
I tried to load the package with using ParallelTest after doing

using Distributed
addprocs(2)

but I get an error saying that the package is not installed.

Steps to reproduce:

(v0.7) pkg> generate ParallelTest
Generating project ParallelTest:
    ParallelTest\Project.toml
    ParallelTest/src/ParallelTest.jl

shell> cd ParallelTest
C:\Users\memo\Documents\julia\ParallelTest

(v0.7) pkg> activate .

julia> using Distributed

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> using ParallelTest
[ Info: Precompiling ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e]
ERROR: On worker 2:
ArgumentError: Package ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.

_require at .\loading.jl:923
require at .\loading.jl:852
#2 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:77
#116 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:276
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:56
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:65
#102 at .\task.jl:262
#remotecall_wait#154(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.Worker) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:407
remotecall_wait(::Function, ::Distributed.Worker) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:398
#remotecall_wait#157(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:419
remotecall_wait(::Function, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:419
(::getfield(Distributed, Symbol("##1#3")){Base.PkgId})() at .\task.jl:262

...and 1 more exception(s).

Stacktrace:
 [1] sync_end(::Array{Any,1}) at .\task.jl:229
 [2] macro expansion at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:75 [inlined]
 [3] macro expansion at .\task.jl:247 [inlined]
 [4] _require_callback(::Base.PkgId) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:74
 [5] #invokelatest#1 at .\essentials.jl:691 [inlined]
 [6] invokelatest at .\essentials.jl:690 [inlined]
 [7] require(::Base.PkgId) at .\loading.jl:855
 [8] macro expansion at .\logging.jl:311 [inlined]
 [9] require(::Module, ::Symbol) at .\loading.jl:834
1 Like
using Distributed
addprocs(2)
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using ParallelTest

should do the trick. But I don’t know whether this is the best way to solve your Problem.

2 Likes

If all you want is use a method from Paralleltest containing pmap on your main process, you have to do using Paralleltest before adding processes. The second process doesn’t need Paralleltest per se, but it needs the function given to it:

help?> pmap
Search: pmap promote_shape typemax PermutedDimsArray process_messages

  pmap(f, [::AbstractWorkerPool], c...; distributed=true, batch_size=1, on_error=nothing, retry_delays=[], retry_check=nothing) -> collection

  Transform collection c by applying f to each element using available workers and tasks.

  For multiple collection arguments, apply f elementwise.

  Note that f must be made available to all worker processes; see Code Availability and Loading Packages for details.

#<snip>

It’s a bit more complicated. I have a function that uses pmap, but it depends on a package, so the package should also be available on the workers.

Is there a way to load only some of the files of the package on all the workers?

Did you try @Rudi79 code? It should do the trick. There is an open issue on GitHub that should fix this, pending some discussion.

Not that I know of - you can however load the other packages on the other workers too using @everywhere using AnotherPackage.

Thanks for the answers!
I am trying to load some files on all workers, but I am not sure how to do it properly.
Considering the example in the first post, I changed the contents of ParallelTest.jl to:

__precompile__(false)

module ParallelTest

using Distributed
@everywhere using Pkg
@everywhere Pkg.activate(".")

@everywhere include("$(@__DIR__)/parallel.jl")

@everywhere using .PMod

greet() = print("Hello World!")

end # module

and added another file parallel.jl in the src directory containting:

module PMod

println("loaded")

end  # module PMod

I managed to load the module successfully, but I get redefinition warnings for the module loaded in parallel.

(v0.7) pkg> activate .

julia> using Distributed

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> using ParallelTest
[ Info: Precompiling ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e]
loaded
      From worker 3:    loaded
      From worker 2:    loaded
      From worker 2:    WARNING: replacing module PMod.
WARNING: replacing module PMod.
      From worker 2:    loadedWARNING: replacing module PMod.

      From worker 3:    WARNING: replacing module PMod.
loaded
loaded
      From worker 2:    WARNING: replacing module PMod.
      From worker 2:    loaded
      From worker 3:    loaded
      From worker 3:    WARNING: replacing module PMod.
      From worker 3:    loaded

I also tried to use using .PMod instead of @everywhere using .PMod, but I get a LoadError saying that PMod is not defined. I think that the problem is that on the workers it’s in the global scope, while on the master process is inside a module.

Can you give a link to the issue so that I can follow?

Was there any update on this? I’m failing to load my custom package on the workers :frowning: I also tried the thingy recommended

@everywhere begin
 using Pkg
 Pkg.activate(".")
 using MyPackage
 end

without luck. This was definitely easier in 0.6.

I think using MyPackage should not be in the @everywhere block. What errors do you get?

I had the same error as stated above:

ArgumentError: Package ... is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.
e.t.c.

but I found the reason just now. The “startup.jl” file is only executed on the main process, not on the workers, so my “LOAD_PATH” did not include the required directories. On 0.6.4 the “juliarc.jl” was always executed on the workers also. Doing @everywhere println(LOAD_PATH) shows this. So now I’ll just need to find out how to get this LOAD_PATH to all my workers and I should be fine. No need for the Pkg.activate and all that stuff.

Bumping this thread, as there does not seem to be a working solution for everyone on this thread (nor for me).

1 Like

Have you been able to resolve this? I am struggling with the same issue for packages SpecialFunctions and Match… I have tried all suggested remedies (Pks.instantiate etc) to no avail.

This should work if you split the using MyPackage into its own @everywhere block (because the macro moves all import statements to the top of the block, breaking your original attempt).

Thanks for suggestion. Still having problems though, even just getting standard packages like ParallelDataTransfer, Interpolations…Specialfunctions to work – I’m not even attempting my own packages for now.

<<left out loading of all packages on master processor – this works without fail>>

using Distributed
addprocs(2) # verified that I now have 3 procs

@everywhere begin
using Pkg
Pkg.activate(“.”)
end

@everywhere using ParallelDataTransfer # FAILS SEE OUTPUT BELOW
@everywhere using FFTW # OK!
@everywhere using LinearAlgebra # OK!
@everywhere using SpecialFunctions # FAILS
@everywhere using LowRankApprox # FAILS
@everywhere using Interpolations # FAILS
@everywhere using Match # FAILS

The message generated for ParallelDataTransfer is

On worker 2:
ArgumentError: Package ParallelDataTransfer not found in current path:

  • Run import Pkg; Pkg.add("ParallelDataTransfer") to install the ParallelDataTransfer package.
    require at .\loading.jl:823
    eval at .\boot.jl:328
    #116 at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\process_messages.jl:276
    run_work_thunk at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\process_messages.jl:56
    run_work_thunk at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\process_messages.jl:65
    #102 at .\task.jl:259
    #remotecall_wait#154(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\remotecall.jl:421
    remotecall_wait(::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\remotecall.jl:412
    #remotecall_wait#157(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\remotecall.jl:433
    remotecall_wait(::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at C:\Users\julia\AppData\Local\Julia-1.1.1\share\julia\stdlib\v1.1\Distributed\src\remotecall.jl:433
    (::getfield(Distributed, Symbol(“##161#163”)){Module,Expr})() at .\task.jl:259
    …and 1 more exception(s).

    in top-level scope at stdlib\v1.1\Distributed\src\macros.jl:183
    in remotecall_eval at stdlib\v1.1\Distributed\src\macros.jl:199
    in macro expansion at base\task.jl:245
    in sync_end at base\task.jl:226

Have you confirmed that Pkg.activate(".") actually does what you think? "." assumes that your workers started in the directory you expect; maybe try @everywhere println(pwd()) before trying to load packages to confirm you are where you think you are.