Distributed workflow for MCMC

I would like to set up an MCMC workflow using Distributed.

I have a Julia script that does the following:

  1. load all packages
  2. load the data
  3. run an MCMC chain, with index i, for i in 1:5
  4. save the result in some_chain_$i.jld2
  5. done! send the user a message.

I would like parallelize step 3 with Distributed. Is there a tutorial that would get me started? I have never used this package before, so sorry if not all questions make sense.

I am running everything on a single server which I fully control, so processes are local. Is it sufficient to just use addprocs(5) with the local manager?

Is it enough if I load packages using @everywhere?

For the core computation, can I just do something like this:

remotes = [remotecall(my_mcmc_runner, i, data, logdensity) for i in 1:5]
map(fetch, remotes)

to automatically finish when all tasks are done?

Slight tangent, but IIUC Pidgeons.jl is essentially a distributed MCMC engine Custom MCMC · Pigeons.jl

Thanks, but that’s not what I want to do, I want to use DynamicHMC.jl.

I just need help with Distributed, as explained above.

MCMC is just the context, to indicate the kind of parallelism.

Not a direct answer, but consider using DistributedNext.jl instead. Much faster to start a worker and other nice improvements.

To answer these questions: Yes to all of them.

Perhaps consider controlling the RNGs of your workers. I don’t think you can realistically have 2 of them start with the same seed but for reproducability it of course beneficial nonetheless.

My understanding is that pmap does basically all you need under the hood and you should just try that as a first pass.