How am I supposed to use pmap?


#1

I have a large project where one function includes a pmap call, and I’m not sure the right way to get the code in scope on all the workers.

  1. @everywhere include("code.jl") where code.jl uses all the packages and includes my functions. Then I run a function from my code. This doesn’t work and gives me a lot of warnings/errors about reloading modules
  2. Wrapping my whole code in a module MyModule doing include("MyModule.jl") then using MyModule. This doesn’t work, MyModule isn’t defined on the workers.
  3. @everywhere include("MyModule.jl"), then using MyModule. The first line gives a ton of warnings about replacing modules from packages, which surprised me because I haven’t used MyModule yet. It works fine except for all the warnings.

#3

Works fine with me.

julia> @everywhere module dummyModule
           function dummyFunction(i)
               i^2
           end
       end

julia> pmap(dummyModule.dummyFunction, 1:10)
10-element Array{Any,1}:
   1
   4
   9
  16
  25
  36
  49
  64
  81
 100

You may have forgotten to use @everwhere workspace() to clear the previously defined modules from previous failed attempts. include() just copies the code from the file, so it defines the module again everytime you use it.


#4

To avoid warnings, import EveryPackage first, then @everywhere using EachPackage.


#5

Alternatively, you can add the path containing your module to LOAD_PATH:

push!(LOAD_PATH,"YourPath")
@everywhere using MyModule

I think there is a way to permanently add a path to the LOAD_PATH instead.


#6

You can add the push!(LOAD_PATH,...) statement to your julia startup file (i.e. ~/.juliarc.jl on Linux).


#7

I tried adding the import statement for every package inside my module, then using the packages. It didn’t help. Do you mean the module should use `@everywhere using`` ?

I also tried push!(LOAD_PATH, pwd()) because the current directory contains mymodule.jl, but using MyModule doesn’t work (MyModule not found in current path).

It is working fine except for the warnings, so this isn’t really a problem that needs to be solved, but I feel like I’m not approaching this in the proper manner.


#8

Not inside module, but in the script you run. e.g.,

import Plots
import MyModule
@everywhere using Plots
@everywhere using MyModule

#9

The warnings appear when I load the code for the module.

@everywhere include("mymodule.jl")
WARNING: replacing module ForwardDiff.
WARNING: replacing module ForwardDiff.
WARNING: replacing module ForwardDiff.
WARNING: replacing module ForwardDiff.
WARNING: replacing module BenchmarkTools
WARNING: replacing module BenchmarkTools
WARNING: replacing module BenchmarkTools
WARNING: replacing module BenchmarkTools
......
using MyModule

After that it works. I don’t find @everywhere using to be necessary.


#10
include("mymodule.jl")
@everywhere using MyModule

#11
julia> include("mymodule.jl")

julia> @everywhere using MyModule
ERROR: On worker 2:
ArgumentError: Module MyModule not found in current path.
Run `Pkg.add("MyModule")` to install the MyModule package.

Doesn’t work.


#12

Do you have a minimal example?

I typically use this approach:

  1. Put everything you need distributed to workers into a Module.
  2. Save Module to filesystem
  3. Ensure path to Module is in LOAD_PATH
  4. addprocs()
  5. using Module

Notes:

  • It is important to addprocs before using Module
  • You shouldn’t need @everywhere
  • You shouldn’t get warnings about replacing module.

Here’s an example:

module ModuleA
    export testA
    function testA(x)
        println(x, " -> ", x*x)
        return x*x
    end
end

Save this to file ModuleA.jl (or ModuleA/src/ModuleA.jl)

Now run this:

addprocs()

# ensure module is in LOAD_PATH
# in this example, module A was saved to working directory (but can be any directory)
thisDir = dirname(@__FILE__())
any(path -> path == thisDir, LOAD_PATH) || push!(LOAD_PATH, thisDir)

using ModuleA

pmap(testA, 1:10)

And here’s the output:

Julia-0.5.2> include("loading_pmap.jl")
        From worker 2:  1 -> 1
        From worker 3:  4 -> 16
        From worker 5:  2 -> 4
        From worker 4:  3 -> 9
        From worker 2:  5 -> 25
        From worker 5:  6 -> 36
        From worker 4:  7 -> 49
        From worker 3:  8 -> 64
        From worker 2:  9 -> 81
        From worker 4:  10 -> 100
10-element Array{Any,1}:
   1
   4
   9
  16
  25
  36
  49
  64
  81
 100

#13

This worked well thanks. I realized my problem with the LOAD_PATH approach I tried earlier. The module file needs to have the same name as the module, so ModuleA.jl for ModuleA.

I did get a mysterious error after running my code though.

using PyPlot
WARNING: An error occurred during inference. Type inference is now partially disabled.
Base.MethodError(f=typeof(Core.Inference.convert)(), args=(Base.AssertionError, "invalid age range update"), world=0x0000000000000abf)
... hundreds of lines of errors

This is probably unrelated to the current situation though. Maybe an 0.6 RC1 bug.


#14

Using the example from @greg_plowman works for me too but if I change the pmap command to be:

pmap(t -> testA(t), 1:10)

Then the worker cannot find testA. Any idea how to resolve this?


#15

pmap(t -> ModuleA.testA(t), 1:10)


#16

Thanks. Using @everywhere using ModuleA instead also works.