How should a package use Distributed.jl? The goal is that you have a package with some functions, and then you execute MyPackage.run() in your main process and the package in the background does all the ‘clever’ parallelisation.
The package would be imported something like this:
using Distributed
addprocs(4)
using MyPackage
MyPackage.run()
It turns out you cannot do this.
Take for example:
using Distributed
addprocs(4)
module Good
using Distributed
@everywhere function Hello()
println("Hello")
return "Hello"
end
end
@everywhere Hello()
This code works fine, but if we extract the module “Good” into a package. We face the following problem:
Package source:
DistributingTestPackage.jl
module DistributingTestPackage
using Distributed
@everywhere function Hello()
println("Hello")
return "Hello"
end
@everywhere println("Hello World!")
end
Main code:
using Distributed
addprocs(4)
using DistributingTestPackage
@everywhere Hello()
Running this code gives the following error:
On worker 2:
UndefVarError: `Hello` not defined
In other words, extracting a module into a package changes the way it interacts with Distributed.jl.
Of course the problems above can be easily worked around by doing something like
using DistributingTestPackage
@everywhere using DistributingTestPackage
and then calling the functions in some other way.
However, requiring users of a package to call @everywhere using … every time they use the package feels like this is not ‘best practice’. Therefore, what is the proper way of implementing multiprocessing into a package? Does anybody know any package which properly implements multiprocessing on e.g. JuliaHub that I can have a look at? (Possibly even in a simulation context)
Some context: I have a package which runs some Monte Carlo simulation that needs to run on N cores, afterwards the Monte Carlo results need to be averaged, etc… Due to the complexity, it would be nice to extract the Monte Carlo code into a package such that it can be reused.