Distributed parallelism within packages/applications

question

#1

How can I use parallelism (on multiple processes) in a julia application/package?
I have an application (in the Pkg3 sense that is not a package) in which I use pmap.
I tried to load the package with using ParallelTest after doing

using Distributed
addprocs(2)

but I get an error saying that the package is not installed.

Steps to reproduce:

(v0.7) pkg> generate ParallelTest
Generating project ParallelTest:
    ParallelTest\Project.toml
    ParallelTest/src/ParallelTest.jl

shell> cd ParallelTest
C:\Users\memo\Documents\julia\ParallelTest

(v0.7) pkg> activate .

julia> using Distributed

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> using ParallelTest
[ Info: Precompiling ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e]
ERROR: On worker 2:
ArgumentError: Package ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.

_require at .\loading.jl:923
require at .\loading.jl:852
#2 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:77
#116 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:276
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:56
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\process_messages.jl:65
#102 at .\task.jl:262
#remotecall_wait#154(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.Worker) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:407
remotecall_wait(::Function, ::Distributed.Worker) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:398
#remotecall_wait#157(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:419
remotecall_wait(::Function, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\remotecall.jl:419
(::getfield(Distributed, Symbol("##1#3")){Base.PkgId})() at .\task.jl:262

...and 1 more exception(s).

Stacktrace:
 [1] sync_end(::Array{Any,1}) at .\task.jl:229
 [2] macro expansion at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:75 [inlined]
 [3] macro expansion at .\task.jl:247 [inlined]
 [4] _require_callback(::Base.PkgId) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Distributed\src\Distributed.jl:74
 [5] #invokelatest#1 at .\essentials.jl:691 [inlined]
 [6] invokelatest at .\essentials.jl:690 [inlined]
 [7] require(::Base.PkgId) at .\loading.jl:855
 [8] macro expansion at .\logging.jl:311 [inlined]
 [9] require(::Module, ::Symbol) at .\loading.jl:834

#2
using Distributed
addprocs(2)
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using ParallelTest

should do the trick. But I don’t know whether this is the best way to solve your Problem.


#3

If all you want is use a method from Paralleltest containing pmap on your main process, you have to do using Paralleltest before adding processes. The second process doesn’t need Paralleltest per se, but it needs the function given to it:

help?> pmap
Search: pmap promote_shape typemax PermutedDimsArray process_messages

  pmap(f, [::AbstractWorkerPool], c...; distributed=true, batch_size=1, on_error=nothing, retry_delays=[], retry_check=nothing) -> collection

  Transform collection c by applying f to each element using available workers and tasks.

  For multiple collection arguments, apply f elementwise.

  Note that f must be made available to all worker processes; see Code Availability and Loading Packages for details.

#<snip>

#4

It’s a bit more complicated. I have a function that uses pmap, but it depends on a package, so the package should also be available on the workers.

Is there a way to load only some of the files of the package on all the workers?


#5

Did you try @Rudi79 code? It should do the trick. There is an open issue on GitHub that should fix this, pending some discussion.


#6

Not that I know of - you can however load the other packages on the other workers too using @everywhere using AnotherPackage.


#7

Thanks for the answers!
I am trying to load some files on all workers, but I am not sure how to do it properly.
Considering the example in the first post, I changed the contents of ParallelTest.jl to:

__precompile__(false)

module ParallelTest

using Distributed
@everywhere using Pkg
@everywhere Pkg.activate(".")

@everywhere include("$(@__DIR__)/parallel.jl")

@everywhere using .PMod

greet() = print("Hello World!")

end # module

and added another file parallel.jl in the src directory containting:

module PMod

println("loaded")

end  # module PMod

I managed to load the module successfully, but I get redefinition warnings for the module loaded in parallel.

(v0.7) pkg> activate .

julia> using Distributed

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> using ParallelTest
[ Info: Precompiling ParallelTest [0e712700-ac5e-11e8-3696-ef47f8f04c4e]
loaded
      From worker 3:    loaded
      From worker 2:    loaded
      From worker 2:    WARNING: replacing module PMod.
WARNING: replacing module PMod.
      From worker 2:    loadedWARNING: replacing module PMod.

      From worker 3:    WARNING: replacing module PMod.
loaded
loaded
      From worker 2:    WARNING: replacing module PMod.
      From worker 2:    loaded
      From worker 3:    loaded
      From worker 3:    WARNING: replacing module PMod.
      From worker 3:    loaded

I also tried to use using .PMod instead of @everywhere using .PMod, but I get a LoadError saying that PMod is not defined. I think that the problem is that on the workers it’s in the global scope, while on the master process is inside a module.


How to specify absolute path for include("file.jl") on multiple workers?
#8

Can you give a link to the issue so that I can follow?


#9

Was there any update on this? I’m failing to load my custom package on the workers :frowning: I also tried the thingy recommended

@everywhere begin
 using Pkg
 Pkg.activate(".")
 using MyPackage
 end

without luck. This was definitely easier in 0.6.


#10

I think using MyPackage should not be in the @everywhere block. What errors do you get?


#11

I had the same error as stated above:

ArgumentError: Package ... is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.
e.t.c.

but I found the reason just now. The “startup.jl” file is only executed on the main process, not on the workers, so my “LOAD_PATH” did not include the required directories. On 0.6.4 the “juliarc.jl” was always executed on the workers also. Doing @everywhere println(LOAD_PATH) shows this. So now I’ll just need to find out how to get this LOAD_PATH to all my workers and I should be fine. No need for the Pkg.activate and all that stuff.


#12

Bumping this thread, as there does not seem to be a working solution for everyone on this thread (nor for me).