Distributed workflow for MCMC

I would like to set up an MCMC workflow using Distributed.

I have a Julia script that does the following:

  1. load all packages
  2. load the data
  3. run an MCMC chain, with index i, for i in 1:5
  4. save the result in some_chain_$i.jld2
  5. done! send the user a message.

I would like parallelize step 3 with Distributed. Is there a tutorial that would get me started? I have never used this package before, so sorry if not all questions make sense.

I am running everything on a single server which I fully control, so processes are local. Is it sufficient to just use addprocs(5) with the local manager?

Is it enough if I load packages using @everywhere?

For the core computation, can I just do something like this:

remotes = [remotecall(my_mcmc_runner, i, data, logdensity) for i in 1:5]
map(fetch, remotes)

to automatically finish when all tasks are done?

Slight tangent, but IIUC Pidgeons.jl is essentially a distributed MCMC engine Custom MCMC · Pigeons.jl

Thanks, but that’s not what I want to do, I want to use DynamicHMC.jl.

I just need help with Distributed, as explained above.

MCMC is just the context, to indicate the kind of parallelism.

Not a direct answer, but consider using DistributedNext.jl instead. Much faster to start a worker and other nice improvements.

2 Likes

To answer these questions: Yes to all of them.

Perhaps consider controlling the RNGs of your workers. I don’t think you can realistically have 2 of them start with the same seed but for reproducability it of course beneficial nonetheless.

1 Like