Through some trials and errors I realized that modules, which are loaded with using
, are also loaded on the already existing workers. Therefore the following code runs without an error:
using Distributed
addprocs(2)
using PyPlot
pmap(workers()) do w
w, myid(), PyPlot.version
end
I find this behavior quite inconvenient because in a situation with many workers (to generate some data) connected to one jupyter notebook kernel (to analyze this data) I rather prefer not to load a plotting package on all workers.
Somewhat inconsistently to this behavior I found out that a module which is already loaded in the main process will not be loaded on all workers, if it was include with using
before the workers were created. Julia is very smart and seems to know that the module is already there in process 1, however does not load it into the workers.
Therefore this code gives errors:
using Distributed
using PyPlot
addprocs(2)
using PyPlot
pmap(workers()) do w
w, myid(), PyPlot.version
end
which certainly can be resolved by dressing the second using
with @everywhere
:
using Distributed
using PyPlot
addprocs(2)
@everywhere using PyPlot
pmap(workers()) do w
w, myid(), PyPlot.version
end
Is this intended the intended behavior? Is there a way to load a module only on the local process even if there are already workers running?
1 Like
The intended behaviour is that using
loads the packages on workers so that type definition are available and you can serialize and deserialize messages between the workers.
I consider the second behaviour where doing using; addprocs; using
does not load the package on the recently added workers a bug, which i attempted to fix in https://github.com/JuliaLang/julia/pull/28860
I have the exact same usecase, although I’ve slowly trained myself to separate imports before/after addprocs
depending on if you want them on the workers or not.
That said, I just played around with it a bit and the following seems like a way to remove the auto-loading feature:
using Distributed
addprocs()
filter!(!=(Distributed._require_callback), Base.package_callbacks)
using PyPlot # now only loaded on master process
pmap(workers()) do w
w, myid(), PyPlot.version
end # will now error
and you can still @everywhere using PyPlot
later and it will work. There may well be other things I haven’t thought of that this breaks though so I’d be careful messing with the internals like this.
I still find this a bit inconsistent, since statements like x=1
need an @everywhere
to have an effect on workers - but using XYZ
not.
And yes - if using XYZ
would finish fast and only make type definitions visible on the workers I would not care. But right now I use to have 200+ workers connected to my master process. Running using Plots
would probably crash my hard drive and the network.
In the meantime I thought that probably @everywhere [1] using XYZ
would do the job - but no - using
overrules the specification of the worker processes [1]
.
I would definitely vote for a some kind of @onlyhere
macro.
Does import
also load onto every worker? If not, you could use ImportAll.jl
https://github.com/NTimmons/ImportAll.jl
There’s probably a better way, though (like the above, or just loading the package before adding workers)