Symplectic integration and dynamic Hamiltonian Monte Carlo

In A Conceptual Introduction to Hamiltonian Monte Carlo, Michael Betancourt talks about the instability of computing Hamiltonian trajectories, and introduces symplectic integration, especially “leapfrog” as a solution. Differential equations are something of a “killer app” in Julia, thanks to @ChrisRackauckas (and presumably others, sorry I’m only aware of Chris in this domain).

This makes me curious… Does DifferentialEquations.jl have capabilities that are not available in other systems, that could advance the state of the art in MCMC?

Tagging @Tamas_Papp as well, since I’m sure he might have some insight from his work on DynamicHMC.jl.

2 Likes

The methods I would try on HMC would be Highly stable symplectic methods · Issue #82 · SciML/OrdinaryDiffEq.jl · GitHub . Most of DiffEq isn’t necessary for HMC though since there’s no events and it’s a simple explicit RK loop. However, if we do end up doing fully adaptive symplectic integrators (which are quite difficult) then that might be something interesting to try in an HMC routine.

The issue is that HMC is using the symplectic integration for a purpose, but doesn’t truly care about the integration itself. It’s using the integration as a way to reduce sampling rejections, and so it needs the area property but reducing the error in the integration is not really a helpful goal. Put another way, DiffEq is about being efficient, with efficiency defined as the speed to get a certain error vs the true solution of the differential equation. For HMC, efficiency is doing as quick of an integration step as possible while reducing your rejection rate, which isn’t necessarily the same thing and so a lot of the more advanced integration schemes don’t end up actually being helpful in this domain.

6 Likes

I am not an expert on symplectic integrators, just a user. With that in mind, AFAIK HMC algorithms are using them because if the stepsize is “small enough”, you get errors that have nicer properties with not too many function evaluations (this is important, see below), most importantly they are contained within a band (if things work out), as opposed to non-symplectic integrators which can have error increasing non-linearly over time.

Stepsize adaptation usually happens with a heuristic. It can be problematic if the logdensity has highly varying curvature, as the sampler cannot adapt to the whole space well. The practical solution for this is jittering the stepsize and hoping for the best, and for truly nasty models think about reparametrizations (cf the folk theorem of statistical computing).

AFAIK modern (non-symplectic) integrators can have excellent properties in practice, and even if there are errors, the NUTS2 algorithm is built to cope with that, it is just most efficient when errors are not too large. With the default settings of NUTS/DynamicHMC, we usually build up a tree with a doubling algorithm up to depth 5, which means 32 nodes.

In practical problems, the cost of the integrator is negligible (\ll1\%) compared to log density evaluations (with derivatives), as the latter dominates everything. Even if we had more interim nodes, the NUTS2 algorithm per se can easily cope with that.

So the real question is: compared to leapfrog, what would be the accuracy of other algorithms traversing the same trajectory (we want to get far), and the same number of logdensity evaluations? I don’t know enough about numerical ODE to answer this, but if someone suspects that they can do better than leapfrog on problems of varying curvature while not increasing the function evaluations drastically, that could be a big win as it would allow another layer of automation. Thinking about reparametrizations that “smooth” problems is probably the trickiest part of practical MCMC now for complex models.

I built DynamicHMC to be somewhat modular, even though leapfrog is hardcoded ATM it would not take much to modularize it. So if someone wants to experiment with such things, let me know.

4 Likes

This has been really helpful. @ChrisRackauckas, this is a good point about the objective being a bit different between the two domains. And @Tamas_Papp, your point about varying curvature is exactly what I had in mind, but I’m not expert in this area either.

Anyway, it seems there’s no quick win in this area based on existing DiffEq code today without some deeper analysis and experimentation. Still some interesting potential, and maybe worth keeping an eye on for potential future opportunities.

Thank you both!

The challenge is that improving on the algorithm requires understanding of

  1. numerical methods differential equations,
  2. differential geometry,
  3. MCMC and Bayesian models (both practical and theoretical).

and even in that case it is unclear that there are low-hanging fruits.

Improving on the AD is something that is orthogonal but at the same time an area where Julia would have a comparative advantage.

What symplectic integrators are actually doing is something that’s global, as shown here:

So this kind of stepsize jittering actually gets rid of the symplectic property (which is why I said adaptive timestepping for symplectic is very difficult). However, HMC is using a symplectic integrator because of its volume preservation property meaning less rejections, not because of its long-term integration properties, so it’s probably fine to do with HMC but not a true integration.

That’s what methods like Highly stable symplectic methods · Issue #82 · SciML/OrdinaryDiffEq.jl · GitHub are supposed to do by increasing the stability region proportionally larger than the stepsize. I need to do some more research in this though, and I want to go to higher order.

My understanding is that a single stepsize is selected randomly for a trajectory, and for a trajectory it remains fixed.

Also, I think that one would need to apply a Jacobian correction if it did not preserve volume.