Parallel mc recommendation

For the following code, which runs independent chains of Random Walk Metropolis, there is the obvious potential for parallelization. I am wondering what strategies people would recommend. This is related to a prior question I had about how to use pmap when it was being applied to something which was an in place transformation.

function Boltzmann_likelihood(x, V, beta)
    w = exp(-beta * V(x));
    return w

function RWM!(X0, V::Vf, beta; niterations=10^2, Dt = 1.e-1) where {Vf}

    Xp = similar(X0);

    naccept = 0;
    p0 = Boltzmann_likelihood(X0, V, beta);

    gaussian_coef =  sqrt(2 * Dt/beta);

    for j = 1:niterations

        @. Xp = X0 + gaussian_coef * randn();

        pp = Boltzmann_likelihood(Xp, V, beta);
        a = min(1, pp/p0);
        #a = 1.

        if rand()<a
            naccept = naccept+1;
            @. X0 = Xp;
            p0 = pp;

nsamples = 10^2;
d= 2;
beta = 1;

function V(x)
    return (dot(x,x)-1)^2

X0vals = [randn(d) for i=1:nsamples];
Xbar= zeros(d);

for k = 1:nsamples
    RWM!(X0vals[k], V, beta)

    @. Xbar += X0vals[k]/nsamples;


Something like (very stylized)

random_seeds = [srand() for _ in 1:5];

function do_mcmc(rng)
    # your code that does mcmc using the
    # random number generator explicitly

chains = pmap(do_mcmc, random_seeds)

That said, RWMH is the algorithm of last resort these days, I would try a variant of HMC (eg NUTS), which should be orders of magnitude faster per effective sample.


A few things:

  1. This stylization does not appear to resolve the in place action of my RWM! code. Is there a way around this?
  2. Your suggestion just passes a different seed to each of the processes. I usually hear this approach to parallel RNG is ok, but not great.
  3. I know that RWM isn’t great, I’m just using this as a prototype for the kinds of problems that I am interested in doing, where I might want to run multiple independent chains in parallel.
1 Like

As an addition to what Tamas_Papp said, note that Mamba.jl implements NUTS among other algorithms, and does parallelization automatically if you have multiple Julia processes. Its plotting functionality didn’t work as advertised for me, but it’s still useful. If you can make your model fit their syntax I’d go for it.

Tamas has, which also implements NUTS too (without need for a framework).

1 Like

Ah, thanks. I hadn’t come across this before but for simple problems I rather like the more straightforward approach. I wouldn’t know what to do with most of the algorithms in Mamba anyway.

Presumably, you wrap the whole thing (allocating the storage for a chain, etc). That said, in my experience for anything nontrivial the log-posterior evaluation dominates in MCMC, and savings by preallocation are not worth the code complication.

OK, so use something better :wink:; this is orthogonal to parallelization.

I guess I’m thinking about problems where my random variable is a discretization of a function on some domain and my evaluation of the likelihood involves solving a PDE. To me, that seemed like a problem where I would really want to preallocate. But you’re saying that will turn out not to be the problem?

This goes beyond my simple RWM example. Suppose I am doing a study of different initial conditions to a time dependent 2D/3D PDE. I generate my initial conditions, each of which may be rather large, and then evolve each of them, independently. There, it would seem that I might want to avoid allocating (again) in order to be pmap compatiable.

Indeed, the RNG issue is orthogonal (or complementary). I know the RandomNumbers.jl includes random123 (D. E. Shaw Research: Research) which is supposed to be statistically safe for multithreading/mulitprocessing environments, but the authors don’t seem to have any examples of that usage. I was hoping maybe someone here had some experience with that kind of task. My own experience has been with SPRNG ( for C/C++ coding.

Preallocate for the evaluation of the log posterior if that makes sense for your problem — the cleanest way I found for this is making a callable struct.

My point was that preallocating for the chain is a very minor (if measurable at all) improvement, in most cases.

I am aware of the theoretical issue, but this is usually the least of my concerns when running MCMC on a nontrivial problem.