Tune Metropolis-Hastings (normal distribution) scale parameters (and algorithms parameters in general)

I am doing Bayesian inference of a few parameters (~3), albeit of a highly nonlinear model, and I am interested in the performance of random-walk Metropolis-Hastings versus Hamiltonian Monte Carlo methods. I don’t expect much from MH, but the traces seem particularly bad (below for one of the parameters, but the others are similar).

MH-trace

Looking at the API guides and the MH function help it seems like there is no default tuning of the scale of the proposal distribution, and in fact I couldn’t really find how to do it among the options.

So here’s the question:

  • Ideally I would like to use a normal random-walk proposal and tune the size of the jump by scaling the variance during warm-up, is there a way of achieving that?

Thanks in advance.

I don’t have an answer to your question, but I want to point out that the documentation you are referring to is out of date. Unfortunately, the documentation was moved to a different site, but the developers have been unable to remove the old site from Goggle searches. Here is the correct documentation. I don’t think it differs in this case, but it might give you more current information in other cases.

Thanks. I realized some things were completely out of date, and some links or searches maybe pointed to up to date docs (which I now see is a different page), I’ll make sure I’m always looking things up on that one.

Unfortunately in this case I don’t think the answer is there. It’s similar to the function help, and doesn’t seem to include any option for tuning a scale parameter for the jump size…

I looked through the repo for MH, but did not find anything about automatic tuning. But I can’t say for sure. The README and documentation show how you can manually specify the proposal distribution for each parameter.

Right. The thing is any fixed distribution will converge to some acceptance rate, but if it’s off of the “recommended” rate of \alpha \approx 0.23 (if I remember correctly) it has no way of changing the distribution to get closer to that target acceptance.
I realize Turing is probably designed to use advanced HMC samplers, but I’m interested in benchmarking the performance difference between samplers, and if I use a handicaped version of the simpler method it won’t be a fair comparison.

What’s the purpose of the exercise? Typically, a modern HMC like NUTS should beat MH by orders of magnitude (for ESS/time).

Nonlinearity per se is not very important, unless it leads to highly variant curvature of the log posterior. Then you should consider reparametrizing — but it is hard to suggest anything more specific without seeing the model.

1 Like

The model is a system of nonlinear ODEs that approximates an even more complex model; there’s little to no Bayesian inference done for this kind of model, so the purpose is comparing the performance of basic/old-school samplers with more advanced ones. So if HMC/NUTS beats MH by a large margin (and I believe it will) we want to know by how much and in which conditions/variants of the model and so on.

No, but it is more likely to lead to more rugged parameter surfaces. Reparameterizing could be an option, but that would require a lot of trial an error to do so and therefore manually “smooth” out the parameter surface. That’s the whole point of HMC, using the gradient and metric of the surface to efficiently sample it, right?
The MH approach to this problem is to adjust the scale of the jumps to adjust to the curvature in each dimension, and even with that performance is poor with certain geometries, so without it I think it’s pretty hopeless…
so not having it is a serious handicap to the method

HMC/NUTS is not magic, it cannot cope with posteriors which have highly varying second derivatives, and may be inefficient for mild cases. YMMV, you will only be in a position to assess this after you sample, but if you encounter this difficulty then reparametrizing is a good investment. The Stan manual has a couple of examples, also see discussions on the Stan forum.

Of course, but it’s a first-derivative approximation of the surface against nothing for MH, the only treatment of varying curvatures along different dimensions is tuning the scale of the normal distribution’s variance, so it.'s important that it’s there to address the improvement of the gradient-based approaches compared to this baseline.

I’m very familiar with Stan, and normally would have used it for anything Bayesian. This is the first time I am using Julia for any serious project, since it involves a modeling component and also non-HMC samplers which are unfortunately not implemented in Stan. But from what I gather the flexibility of the Metropolis-Hasting implementation is limited — since it’s a relatively simple method there may be other packages that allow it, and if not implementation from scratch is not difficult, though I’d prefer to be able to use something out of the box.

Anyway, Thanks again.

You might consider submitting a pull request to AdvancedMH.jl to add a tuning option.

1 Like

That sounds like a great idea.

1 Like