Nothing I can share publicly unfortunately, but here’s a lightly sanitized version:
module MyStuff
using Distributed
using Pkg
# async create n_workers worker processes
function create_worker_processes(n_workers, manager=Disributed.LocalManager(); revise=false)
tasks = map(1:n_workers) do n
@async create_worker_process(manager; revise, n)
end
return @async map(fetch, tasks)
end
function create_worker_process(manager; revise=false, n=nothing)
n_str = n === nothing ? "" : " $(n)"
worker_str = string("worker", n_str)
@info "Requesting $(worker_str)..."
pid = only(addprocs(manager))
# make sure we activate the ACTUAL PROJECT that's active on the manager,
# which may be different than `@.` during e.g. CI runs
project = Pkg.project().path
Distributed.remotecall_eval(Main, pid,
:(using Pkg; Pkg.activate($(project))))
if revise
@info "Loading Revise on $(worker_str)..."
Distributed.remotecall_eval(Main, pid, :(using Revise))
end
@info "Loading MyStuff on $(worker_str)..."
Distributed.remotecall_eval(Main, pid, :(using MyStuff))
@info "$(worker_str) ready, PID $(pid)"
return pid
end
end # module
passing around the manager is a bit of extra cognitive overhead but v. useflu when you’re juggling, say, different kinds of K8s resources (GPU-equipped pods for training, CPU-only for batching etc.). in that case, we usually have another layer like
provision_workers(config)
train_workers = create_worker_processes(train_manager(config), config.n_train_workers)
batch_workers = create_worker_processes(batch_manager(config), config.n_batch_workers)
return (; train_workers = fetch(train_workers), batch_workers = fetch(batch_workers), config)
end
then we pass this “harness” to the functions that actually do the training/batching work so they know what workers to use. that is, there’s no magical distributed execution: the user/driver script has to specify which resources to use; this just makes it a bit more convenient to set up those resources so that they can be used effectively so the user does not need to do @everywhere using MyStuff
, set the project, etc.