How to load a whole "project" into a worker with @everywhere

Hi I have a very complex function called “simulation”.
I want to remotely run with these instructions:

 results =remotecall_fetch(simulate,2,
                                SIS(),
                                get!(runningparams, :df, nothing),
                                get!(runningparams, :intervals, nothing),
                                get!(runningparams, :user2vertex, nothing),
                                get!(runningparams, :loc2he, nothing),
                                convert(Dates.Millisecond, Dates.Minute(test[:δ]));
                                Δ = test[:Δ],
                                vstatus = per_infected_data[test[:infected_percentage]],
                                per_infected = test[:infected_percentage],
                                c = test[:c],
                                βd = test[:βd],
                                βᵢ = test[:βᵢ],
                                βₑ = test[:βₑ],
                                γₑ = test[:γₑ],
                                γₐ = test[:γₐ],
                                niter = 2,
                                output_path = res_path,
                                store_me = false)

The function consists of several sub-functions (and a type) written in other files.
I can’t load it well on my remote workers. (using everywhere).
Can anyone give me some advice?

All code is here: https://github.com/GalloLuigi/HGEpidemics-main/blob/main/src/epidemics/SIS.jl

Could you be a bit more specific on

I can’t load it well on my remote workers. (using everywhere).

? What errors do you get?

Some general advice:

  • instead of passing a lot of arguments to a function, pass a dictionary or even better a struct with the simulation parameters. This makes storing simulation results easier as well. A great package for dealing with the documentation of results of scientific simulations is DrWatson. You do not necessary have to use it but I highly recommend reading the documentation to get the idea on how to manage simulation results. It really helped me with the overall organization of my scientific work.
  • for doing distributed calculations the worker processors have to know the functions/code they should execute. For short code a couple of @everywhere before functions/data which needs to be available to all processes should do the job. For more complex simulations, I would recommend to write remotely called functions into julia files script.jl or even better your own module and load that with @everyhwere include("path/to/script/script.jl") or @everywhere using MyModule. The files/module need to be available to all Julia processes, e.g. via a shared project folder.

What I do mostly is:

  • create remote processes
  • make code available via loading my own modules with @everywhere. In the modules all code is defined the remote processes will ever execute.
  • create my simulation parameters on the host process
  • pass a struct with my simulation parameters to a function called simulate (as you do) I call on the remote workers, e.g. via remotecall_fetch or pmap or @spawnat
  • from there on the remote processes do their calculations and send their results back to the host process, which is then storing the resuls, doing light post processing etc.

This process works best for embarrisingly parallel execution of the same simulations with just different parameters.

I further recommend reading the Julia documentation on distributed computation. It is very well written.

I hope this helps a bit.

Made a small package to help me sync local dev projects to workers, DistributedEnvironments.jl. Not sure if that would help, but if you put all your functionality in a package you could use it to push local changes to workers before running.