Turing.jl/MCMC: Chains getting stuck

I want ask for advice or insight as to why my MCMC chains could be stuck even when using HMC on a uni-modal posterior.

I’m doing parameter calibration of pedestrian models (systems of non-linear ODEs) using Turing.jl. If we generate synthetic data using the model, then inferring the parameters from this data over a short time period (<1s) is a well-posed and well-conditioned problem, as there is little time for the pedestrians’ trajectories to be “chaotically” altered. However, extending the inference to times greater than a few seconds, the chains entirely fail to explore the parameter space and get completely stuck. That is to say, they still accept proposals, it’s just that these proposals are very close to each other, so the chain never moves.

I am confused as to why this would happen, given that I am using HMC (NUTS), which was specifically created to address the issue of non-exploration. The posterior in question looks like this:

image

Which doesn’t strike me as particularly difficult to identify. That said, I’m so new to MCMC that there could be something obvious that I’m missing. One thing I haven’t done is check the identifiability of my parameters using the relevant SciML tools.

Here’s what an example trace plot looks like for my problem:

I think that a MWE would be beneficial, otherwise it’s quite difficult to help.

I apologise for not having done so already. I will do my best, but I’ve been advised not to share too much of the code I’m using so I don’t know how useful it will ultimately be.

Then simulate som data, if you cannot reveal the real data. I think that would make more people able to help.

What sort of data would be useful? I have saved some chains saved, and some data from the output of my ODE.

Edit: I think I misinterpreted your comment: It’s not the data that is not shareable, but the code itself cannot be shared publicly due to how the automatic code plagiarism detection works. I can try to come up with a MWE but even then I’m not sure how useful it will be because the setup and model are quite particular.

Sorry for being off topic, but can you share the code to make the visualization of the posterior? I haven’t been able to figure out how to do so.

1 Like

No worries, the code for creating the visualisations can be found here:
https://turing.ml/dev/docs/using-turing/sampler-viz

There are a few things you need to change (names of variables) to get it to work for your model/chain, but it’s relatively straightforward.

1 Like

I can’t believe this, but the chains getting stuck seemed to be an issue of type-instability. After I checked the code with @code_warntype I discovered that there were some global variables over which I had defined closures. Hence the return type of my Turing model was Any.

I think this had an effect on the \epsilon heuristic for the leapfrog integrator. I might submit an issue on the Turing repo to find out where this came from.

The speedup after fixing this was about 30 times btw…

Never use global variables, kids.

16/04 Edit: This did not fix the problem entirely, however it made it possible to run the simulations faster and diagnose possible problems. I think the chains getting stuck is an issue of the \epsilon heuristic being tricked by a very rough posterior.

4 Likes