I’ve been trying to wrap my head around how to use pmap() properly within a package, but I’m still a bit confused and am starting to suspect my original plan may not work.
For context: I’m working on a process-based ecological model that is computationally quite heavy (multiple hours per simulation run). A typical run follows this workflow:
read in configuration file in getsettings()
based on this, initialise() the model object containing all state
carry out the simulation using the model object in simulate()
However, we frequently have to test different parameter combinations. So that we don’t have to create a separate configuration file for every single run, I have implemented a feature that allows us to define multiple values for a parameter in the configuration file. The getsettings() function recognises this, and signals to the initialise() and simulate() functions that they need to create and run multiple model objects all in one go (one for each combination of parameter values).
As long as I am doing this simply with map() it works just fine, but of course it would be nice to parallelise this to speed things up. Since there is no global state, this should be doable. So I thought I would just have to change the map() to pmap() and add some additional processes based on a configuration parameter:
function initialise(configfile::String=PARAMFILE)
settings = getsettings(configfile)
scanparams = settings["internal.scanparams"]
if isempty(scanparams)
initmodel(settings)
else
addprocs(settings["core.processors"]-1)
pmap(initmodel, paramscan(settings, scanparams))
end
end
function simulate(configfile::String=PARAMFILE)
models = initialise(configfile)
isa(models, Vector) ?
pmap(simulate!, models) :
simulate!(models)
end
However, this gives me an error: ERROR: LoadError: On worker 2: KeyError: key Base.PkgId(Base.UUID("039acd1d-2a07-4b33-b082-83a1ff0fd136"), "Persefone") not found, which I assume means that the worker processes don’t have access to the package I’m working in. I’ve tried fixing this with @everywhere include("src/Persefone.jl") before the addprocs() call above, but this just generates other errors. So I’m not sure how this could be made to work.
I am also starting to wonder whether pmap() is actually intended to be used within a package like this? (For instance, would this require me to prefix every one of our dozens of functions with @everywhere?) Or is pmap() rather intended to be used in an external script that loads a package and then does something with it?
In general, yes, it is tricky using pmap inside of functions because you have to make sure that all of the necessary context is serialized to the remote worker. But it is possible, see for example the implementation of solve_batch for EnsembleProblem in SciMLBase.
Could you be more specific here? What other errors does it generate?
Hi, thanks for the quick answer! I’m afraid I was struggling to come up with a MWE, because of all the different things going on in the code (and me not being sure which of them are important or not).
However, I realised that putting the call to pmap() in a wrapper script would not actually be that difficult, even if it is not quite as nice-to-use as I would have wished. So I wrote the following script, which works fine:
using Distributed
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using Persefone
function parallel_simulate(configfile::String=Persefone.PARAMFILE)
settings = Persefone.getsettings(configfile)
scanparams = settings["internal.scanparams"]
!isempty(scanparams) &&
delete!(settings, "internal.scanparams")
if isempty(scanparams)
Persefone.initmodel(settings) |> simulate!
else
combinations = Persefone.paramscan(settings, scanparams)
pmap(combinations) do c
Persefone.initmodel(c) |> simulate!
end
end
end
@time parallel_simulate()
If anybody still has any tips about using pmap() in the context of a package, I’d still be interested to hear them. But otherwise this has solved the pressing issue for now.
Ah, the link to that answer looks quite useful! And good to know about remotecall_eval, that hadn’t popped up in anything I’d read so far. I’ll take another look at this, maybe I can do something similar to my original plan after all.
Just want to point out that DrWatson.jl has some great features that may be helpful for your situation. I’ve found it extremely helpful for establishing sets of parameters, running simulations, saving results, organizing simulations, etc.