Module loading on workers

pfarndt · December 3, 2019, 5:22pm

Through some trials and errors I realized that modules, which are loaded with using, are also loaded on the already existing workers. Therefore the following code runs without an error:

using Distributed
addprocs(2)

using PyPlot

pmap(workers()) do w
   w, myid(), PyPlot.version
end

I find this behavior quite inconvenient because in a situation with many workers (to generate some data) connected to one jupyter notebook kernel (to analyze this data) I rather prefer not to load a plotting package on all workers.

Somewhat inconsistently to this behavior I found out that a module which is already loaded in the main process will not be loaded on all workers, if it was include with using before the workers were created. Julia is very smart and seems to know that the module is already there in process 1, however does not load it into the workers.

Therefore this code gives errors:

using Distributed
using PyPlot

addprocs(2)

using PyPlot

pmap(workers()) do w
   w, myid(), PyPlot.version
end

which certainly can be resolved by dressing the second using with @everywhere:

using Distributed
using PyPlot

addprocs(2)

@everywhere using PyPlot

pmap(workers()) do w
   w, myid(), PyPlot.version
end

Is this intended the intended behavior? Is there a way to load a module only on the local process even if there are already workers running?

vchuravy · December 3, 2019, 6:51pm

The intended behaviour is that using loads the packages on workers so that type definition are available and you can serialize and deserialize messages between the workers.

I consider the second behaviour where doing using; addprocs; using does not load the package on the recently added workers a bug, which i attempted to fix in https://github.com/JuliaLang/julia/pull/28860

marius311 · December 3, 2019, 6:55pm

I have the exact same usecase, although I’ve slowly trained myself to separate imports before/after addprocs depending on if you want them on the workers or not.

That said, I just played around with it a bit and the following seems like a way to remove the auto-loading feature:

using Distributed
addprocs()

filter!(!=(Distributed._require_callback), Base.package_callbacks)

using PyPlot # now only loaded on master process

pmap(workers()) do w
   w, myid(), PyPlot.version
end # will now error

and you can still @everywhere using PyPlot later and it will work. There may well be other things I haven’t thought of that this breaks though so I’d be careful messing with the internals like this.

pfarndt · December 3, 2019, 7:50pm

I still find this a bit inconsistent, since statements like x=1 need an @everywhere to have an effect on workers - but using XYZ not.

And yes - if using XYZ would finish fast and only make type definitions visible on the workers I would not care. But right now I use to have 200+ workers connected to my master process. Running using Plots would probably crash my hard drive and the network.

In the meantime I thought that probably @everywhere [1] using XYZ would do the job - but no - using overrules the specification of the worker processes [1].

I would definitely vote for a some kind of @onlyhere macro.

dstarerstor · December 5, 2019, 5:34pm

Does import also load onto every worker? If not, you could use ImportAll.jl
https://github.com/NTimmons/ImportAll.jl
There’s probably a better way, though (like the above, or just loading the package before adding workers)

Topic		Replies	Views
Distributed workers automatically load code on "using" General Usage question , distributed	2	470	September 8, 2021
Did Julia code loading in distributed computing changed? General Usage	6	469	April 28, 2021
Load modules in several workers General Usage	19	3732	January 27, 2020
Loading package on newly added processors General Usage package , module , distributed	1	293	October 26, 2021
Unexpected behavior of package loading in workers using Distributed General Usage distributed	7	121	November 27, 2024

Module loading on workers

Related topics