For the following code, which runs independent chains of Random Walk Metropolis, there is the obvious potential for parallelization. I am wondering what strategies people would recommend. This is related to a prior question I had about how to use `pmap`

when it was being applied to something which was an in place transformation.

```
function Boltzmann_likelihood(x, V, beta)
w = exp(-beta * V(x));
return w
end
function RWM!(X0, V::Vf, beta; niterations=10^2, Dt = 1.e-1) where {Vf}
Xp = similar(X0);
naccept = 0;
p0 = Boltzmann_likelihood(X0, V, beta);
gaussian_coef = sqrt(2 * Dt/beta);
for j = 1:niterations
@. Xp = X0 + gaussian_coef * randn();
pp = Boltzmann_likelihood(Xp, V, beta);
a = min(1, pp/p0);
#a = 1.
if rand()<a
naccept = naccept+1;
@. X0 = Xp;
p0 = pp;
end
end
X0
end
nsamples = 10^2;
d= 2;
beta = 1;
function V(x)
return (dot(x,x)-1)^2
end
X0vals = [randn(d) for i=1:nsamples];
Xbar= zeros(d);
for k = 1:nsamples
RWM!(X0vals[k], V, beta)
@. Xbar += X0vals[k]/nsamples;
end
println(Xbar);
```

Something like (very stylized)

```
random_seeds = [srand() for _ in 1:5];
function do_mcmc(rng)
# your code that does mcmc using the
# random number generator explicitly
end
chains = pmap(do_mcmc, random_seeds)
```

That said, RWMH is the algorithm of last resort these days, I would try a variant of HMC (eg NUTS), which should be orders of magnitude faster per effective sample.

4 Likes

As an addition to what Tamas_Papp said, note that `Mamba.jl`

implements NUTS among other algorithms, and does parallelization automatically if you have multiple Julia processes. Its plotting functionality didn’t work as advertised for me, but it’s still useful. If you can make your model fit their syntax I’d go for it.

Tamas has https://github.com/tpapp/DynamicHMC.jl, which also implements NUTS too (without need for a framework).

1 Like

Ah, thanks. I hadn’t come across this before but for simple problems I rather like the more straightforward approach. I wouldn’t know what to do with most of the algorithms in Mamba anyway.

Presumably, you wrap the whole thing (allocating the storage for a chain, etc). That said, in my experience for anything nontrivial the log-posterior evaluation dominates in MCMC, and savings by preallocation are not worth the code complication.

OK, so use something better ; this is orthogonal to parallelization.

I guess I’m thinking about problems where my random variable is a discretization of a function on some domain and my evaluation of the likelihood involves solving a PDE. To me, that seemed like a problem where I would really want to preallocate. But you’re saying that will turn out not to be the problem?

This goes beyond my simple RWM example. Suppose I am doing a study of different initial conditions to a time dependent 2D/3D PDE. I generate my initial conditions, each of which may be rather large, and then evolve each of them, independently. There, it would seem that I might want to avoid allocating (again) in order to be `pmap`

compatiable.

Indeed, the RNG issue is orthogonal (or complementary). I know the `RandomNumbers.jl`

includes random123 (D. E. Shaw Research: Research) which is supposed to be statistically safe for multithreading/mulitprocessing environments, but the authors don’t seem to have any examples of that usage. I was hoping maybe someone here had some experience with that kind of task. My own experience has been with SPRNG (http://www.sprng.org/) for C/C++ coding.

Preallocate for the evaluation of the log posterior if that makes sense for your problem — the cleanest way I found for this is making a callable `struct`

.

My point was that preallocating for the *chain* is a very minor (if measurable at all) improvement, in most cases.

I am aware of the theoretical issue, but this is usually the least of my concerns when running MCMC on a nontrivial problem.