Distributed workers automatically load code on "using"

albheim · September 7, 2021, 12:02pm

So in the docs for distributed we have:

Finally, if DummyModule.jl is not a standalone file but a package, then using DummyModule will load DummyModule.jl on all processes, but only bring it into scope on the process where using was called.

First what exactly does it mean to load, is it the same as import? And second why is this behaviour wanted?

I found it a bit annoying in a few cases, though seems easy enough to work around, but it just didn’t feel consistent to me since I have to use @everywhere using ... to actually use packages I want on all workers.

One example is that I like having some extra packages on my main process for plotting the results and so on. And now I have to make sure to load all packages I might want to use for post processing the results before I add any workers, since if I try to run using Plots after when I do want to plot the results it will fail since my workers does not have Plots installed.

EDIT:
Realised I still had problem with plots even after loading it before, when I run plot(some_results) it whined about GR_jll missing on some of the workers. Why should the plot command even affect the workers if I run it locally?

marius311 · September 7, 2021, 9:01pm

When package Foo is loaded on a worker, everything that normally happens when you call import Foo happens on that worker except that the name Foo is not added to the global namespace, so Foo.stuff on that worker won’t work.

I’m not sure its the best possible choice for behavior, I personally wouldn’t hate if things were fully imported on the workers rather just loaded, but the general idea is that it lets the workers operate on objects which were created on the master process, since for packages loaded after they’re created, they will always have the relevant package loaded, e.g.:

julia> using Distributed

julia> addprocs(1)
1-element Vector{Int64}:
 2

julia> using ComponentArrays # this loads the package on the worker too

julia> arr = ComponentArray(x=1)
ComponentVector{Int64}(x = 1)

julia> @fetch sum(arr) # this ran on the worker just fine
1

julia> @fetch ComponentArray # even though the name itself is not imported
ERROR: On worker 2:
UndefVarError: ComponentArray not defined

This could be because Plots lazily loads some packages on the first plot command, so these end up trying to get loaded on workers even though you did using Plots before the workers were added. You could try triggering this lazy load by doing a dummy plot before adding any workers, or figuring out which packages its loading (via the error message?) and loading them yourself first. Also note by default addprocs workers don’t share the same environment as the master (which will hopefully get fixed), but you can change that by doing addprocs(N, exeflags = "--project=$(Base.active_project())") which might also solve those load errors.

albheim · September 8, 2021, 7:23am

When package Foo is loaded on a worker, everything that normally happens when you call import Foo happens on that worker except that the name Foo is not added to the global namespace, so Foo.stuff on that worker won’t work.

…

I’m not sure its the best possible choice for behavior, I personally wouldn’t hate if things were fully imported on the workers rather just loaded, but the general idea is that it lets the workers operate on objects which were created on the master process, since for packages loaded after they’re created, they will always have the relevant package loaded, e.g.:

Okay, I guess that could make sense but I’m not sure I think it is the best solution.

So if I want to use mean instead of sum from your example I would need to run @everywhere using Statistics but only using ComponentArrays? Why not just require that things which should be used everywhere should also be loaded with @everywhere? Then it would be very clear when you load it everywhere, and you would also be able to choose to only load it locally which seems to be missing now.

…but you can change that by doing addprocs(N, exeflags = "--project=$(Base.active_project())") which might also solve those load errors.

Yeah, the problem is I don’t run processes on the local machine but on many remote machines. And I felt it would be nicer to only replicate the computation environment on those, while plotting is only on the local one. Easy to solve by also putting it on the remote ones, but it feels like it could be very easy to allow this in a simple way and since it is not needed on the workers it would be nice to not have to have it there.

Topic		Replies	Views
Module loading on workers General Usage question , distributed	4	644	December 5, 2019
Unexpected behavior of package loading in workers using Distributed General Usage distributed	7	121	November 27, 2024
Did Julia code loading in distributed computing changed? General Usage	6	469	April 28, 2021
Load modules in several workers General Usage	19	3732	January 27, 2020
Loading package on newly added processors General Usage package , module , distributed	1	293	October 26, 2021

Distributed workers automatically load code on "using"

Related topics