Parallel mc recommendation

gideonsimpson · May 4, 2018, 5:11pm

For the following code, which runs independent chains of Random Walk Metropolis, there is the obvious potential for parallelization. I am wondering what strategies people would recommend. This is related to a prior question I had about how to use pmap when it was being applied to something which was an in place transformation.


function Boltzmann_likelihood(x, V, beta)
    w = exp(-beta * V(x));
    return w
end

function RWM!(X0, V::Vf, beta; niterations=10^2, Dt = 1.e-1) where {Vf}

    Xp = similar(X0);

    naccept = 0;
    p0 = Boltzmann_likelihood(X0, V, beta);

    gaussian_coef =  sqrt(2 * Dt/beta);

    for j = 1:niterations

        @. Xp = X0 + gaussian_coef * randn();

        pp = Boltzmann_likelihood(Xp, V, beta);
        a = min(1, pp/p0);
        #a = 1.

        if rand()<a
            naccept = naccept+1;
            @. X0 = Xp;
            p0 = pp;
        end
    end
    X0
end

nsamples = 10^2;
d= 2;
beta = 1;

function V(x)
    return (dot(x,x)-1)^2
end

X0vals = [randn(d) for i=1:nsamples];
Xbar= zeros(d);


for k = 1:nsamples
    RWM!(X0vals[k], V, beta)

    @. Xbar += X0vals[k]/nsamples;
end

println(Xbar);

Tamas_Papp · May 4, 2018, 7:22pm

Something like (very stylized)

random_seeds = [srand() for _ in 1:5];

function do_mcmc(rng)
    # your code that does mcmc using the
    # random number generator explicitly
end

chains = pmap(do_mcmc, random_seeds)

That said, RWMH is the algorithm of last resort these days, I would try a variant of HMC (eg NUTS), which should be orders of magnitude faster per effective sample.

gideonsimpson · May 4, 2018, 8:12pm

A few things:

This stylization does not appear to resolve the in place action of my RWM! code. Is there a way around this?
Your suggestion just passes a different seed to each of the processes. I usually hear this approach to parallel RNG is ok, but not great.
I know that RWM isn’t great, I’m just using this as a prototype for the kinds of problems that I am interested in doing, where I might want to run multiple independent chains in parallel.

Sean_McBane · May 5, 2018, 12:21pm

As an addition to what Tamas_Papp said, note that Mamba.jl implements NUTS among other algorithms, and does parallelization automatically if you have multiple Julia processes. Its plotting functionality didn’t work as advertised for me, but it’s still useful. If you can make your model fit their syntax I’d go for it.

mauro3 · May 5, 2018, 12:32pm

Tamas has https://github.com/tpapp/DynamicHMC.jl, which also implements NUTS too (without need for a framework).

Sean_McBane · May 5, 2018, 12:44pm

Ah, thanks. I hadn’t come across this before but for simple problems I rather like the more straightforward approach. I wouldn’t know what to do with most of the algorithms in Mamba anyway.

Tamas_Papp · May 7, 2018, 8:01am

Presumably, you wrap the whole thing (allocating the storage for a chain, etc). That said, in my experience for anything nontrivial the log-posterior evaluation dominates in MCMC, and savings by preallocation are not worth the code complication.

OK, so use something better ; this is orthogonal to parallelization.

gideonsimpson · May 7, 2018, 12:00pm

I guess I’m thinking about problems where my random variable is a discretization of a function on some domain and my evaluation of the likelihood involves solving a PDE. To me, that seemed like a problem where I would really want to preallocate. But you’re saying that will turn out not to be the problem?

This goes beyond my simple RWM example. Suppose I am doing a study of different initial conditions to a time dependent 2D/3D PDE. I generate my initial conditions, each of which may be rather large, and then evolve each of them, independently. There, it would seem that I might want to avoid allocating (again) in order to be pmap compatiable.

Indeed, the RNG issue is orthogonal (or complementary). I know the RandomNumbers.jl includes random123 (D. E. Shaw Research: Research) which is supposed to be statistically safe for multithreading/mulitprocessing environments, but the authors don’t seem to have any examples of that usage. I was hoping maybe someone here had some experience with that kind of task. My own experience has been with SPRNG (http://www.sprng.org/) for C/C++ coding.

Tamas_Papp · May 7, 2018, 12:07pm

Preallocate for the evaluation of the log posterior if that makes sense for your problem — the cleanest way I found for this is making a callable struct.

My point was that preallocating for the chain is a very minor (if measurable at all) improvement, in most cases.

I am aware of the theoretical issue, but this is usually the least of my concerns when running MCMC on a nontrivial problem.

Topic		Replies	Views
Within-chain parallelization with Turing.jl Probabilistic programming turing	6	511	September 1, 2023
How should I implement parallel maximum likelihood? General Usage	7	2013	April 4, 2018
Struggling with pmap New to Julia parallel	8	1000	September 5, 2019
Parallel Computations Julia at Scale	6	1020	February 22, 2019
Struggling with parallel code in MCMC simulation Julia at Scale	7	896	January 17, 2020

Parallel mc recommendation

Related topics