Hello! This is my first time trying to perform Bayesian inference, and I have found some problematic behavior when using Hamiltonian Monte Carlo. I’d like some help in figuring out what may be going wrong.
I have a likelihood \mathcal{L}(m | \theta,\phi) where m is a (discrete) possible outcome of an experiment that is parameterized by spherical angles \theta \in [0,\pi] and \phi \in [0,2\pi]. I hand-coded the transformations that implement this domain restriction: for \theta I use \theta \mapsto \arccos (\cos(\theta)), while for \phi it is simply mod2pi
. These transformations are piece-wise linear, so no need to worry about the Jacobian. For my prior I use the uniform measure on the sphere \sin(\theta) / 2\pi.
Given a series of observations, which I assume to be independent, I need to sample from the posterior in order to obtain information about it. Note that my observations are simulated from a categorical distribution using Distributions.jl.
First, I tried to use Metropolis-Hastings (AdvancedMH.jl). It worked great for this case, but I’m also interested in higher dimensional versions of this problem, and this algorithm didn’t generalize to well (to many samples in order to get reasonable convergence).
Then, I went on to try Hamiltonian Monte Carlo (AdvancedHMC.jl and DynamicHMC.jl), which, in theory, aims to circumvent many of the problems of Metropolis-Hastings, but now I found some strange behavior. Most of the time, I get the expected result, which is show in the figure below:
In the background we have the rescaled posterior calculated over a grid of parameters. The black dots are samples from the distribution and the yellow cross is the true value of my parameters \theta,\phi.
Then, with the same set of observations, I run another chain and I might get this sort of result
or even this
It were as if HMC “saw” my posterior as multimodal, which, as the figure shows, isn’t the case.
As I mentioned, this only shows up in HMC (both AdvancedHMC.jl and DynamicHMC.jl show the same behavior). When using AdvancedMH.jl, everything works great. I’m pretty sure that the same posterior is being used in all cases, as I’m using the LogDensityProblems.jl interface.
I’d greatly appreciate if someone could help me find what is going wrong. As already mentioned, this is my first contact with Bayesian inference, so unfortunately there could be some basic facts that I’m not familiar with. Could there be at least some sort of diagnosis to check if the sampling was indeed correct? As the setup is reasonably complex, I’m a little bit reluctant in trying to come up with a MWE, but let me know if you think you need more information.
Thank you very much!