@everything copy absolutely everything in main process

I have a long script that has alot of functions defined and packages included. Then there is a big for loop at the end I want to parallelise. For a minimal example:

using Distributed
Distributed.addprocs(4)
@everywhere using Distributions
@everywhere pdfaddone(x)  = pdf(Normal(), x + 1)
@everywhere rootaddtwo(x) = sqrt(x + 2)
@everywhere const arr     = collect(1:10)
@distributed vcat for x in arr
    pdfaddone(rootaddtwo(x))
end
Distributed.rmprocs(4)

Now I need to have the @everything macro everywhere to send it all to the worker processes. I am not a fan of this because it makes the script harder to read by adding more noise. It also means the logic for distributing the calculation is mixed with the calculation logic, ideally I would like to encapsulate the parallel logic somewhere else.
Is there a way to just add processes that already have absolutely everything as it exists in the main process (For this application I don’t really care if this is wasteful in terms of copying stuff not needed in the parallel calculation - I could clean it up later if it because a problem). So basically do something like:

using Distributions
pdfaddone(x)  = pdf(Normal(), x + 1)
rootaddtwo(x) = sqrt(x + 2)
const arr     = collect(1:10)

using Distributed
Distributed.addprocs(4)
@everywhere everything_in_main_process
@distributed vcat for x in arr
    pdfaddone(rootaddtwo(x))
end
Distributed.rmprocs(4)

I don’t think something like that exists. What you could do is use a combination or Revise.jl and include().

So create your main code file, say main.jl that has all your code. Then in the REPL (or a new file), you can do something like

using Revise # loads revise on the main process
using Distributed
addprocs(4)
@everywhere includet("main.jl") # or using ModuleA if wrapping your code in a module.

Now you can go back to main.jl and any changes you make will be reflected across all workers. Note there are some limitations to what revise can do.

1 Like

How about a block?

@everywhere begin
    using A
    using B
    # …
end

This also works without indentation; so basically it just means adding one line at the top and one at the bottom of the part of your program that should run on all processes

1 Like

Thanks. These are both good suggestions.

It occurred to me that with Julia metaprogramming it might be possible for a macro to detect what needs to be sent to the other processes and then include it. You could go through the function bodies of whatever functions are called at the top level in the parallelised loop. Then detect what Modules or other functions they call. Then record the modules and look into the bodies of the called functions and so on recursively. When you have got to the end you assemble a big @everywhere statement to send everything that is needed. I haven’t used the metaprogramming stuff much though so maybe this is not possible.

If you put all of your functions and constants into a package, and then load the package (using MyPackage) after loading Distributed, then the functions will be defined on every worker automatically. Then you just need to use @everywhere, @spawnat, or remotecall to send data back and forth between workers. I believe Revise also works well if loaded after Distributed and before your package.

2 Likes