Hamiltonian Monte Carlo on sphere

Gattu_Mytraya · July 28, 2024, 3:26pm

I have a PDF defined on (S^{2})^{D} that I want to sample from using Hamiltonian Monte Carlo (HMC). I need to calculate the expectation value of a function against this PDF, which is computationally expensive. Therefore, a smaller Monte Carlo chain (compared to Metropolis-Hastings) would be beneficial.

I can code both the PDF and its derivative. I aim to follow the geodesic Monte Carlo scheme described in section 3.1 of this paper while leveraging the NUTS implementations in DynamicHMC.jl or AdvancedHMC.jl.

What is the best way to achieve this?

gdalle · July 28, 2024, 3:39pm

@kellertuer ?

kellertuer · July 28, 2024, 4:25pm

Thanks for the ping @gdalle.

I am just a mere optimisation person, so I have (upper bounded by) zero knowledge about Monte Carlo nor Metropolis Hastings (besides having heard the names), and even less about NUTS

But Manifolds.jl does offer a sphere M=Sphere(n) (see here of dimension n and you can also get the power manifold (vectors of values on the sphere) N = (Sphere(n))^m (see here). So with that you have a few tools they mention.

We recently started ManifoldDiffEq.jl (see Home · ManifoldDiffEq.jl) so maybe you can even use the Euler method we provide there.

Again, since I am not an expert, I am not sure how far you get with that, but maybe that is a starting point.

Tamas_Papp · July 29, 2024, 7:33am

Can you extend your log posterior in a smooth way (continuous, at least once differentiable) to a small neighborhood around the sphere (ideally the whole Euclidean space that embeds it)? Then you can add a “penalty” term for the distance from the sphere surface, sample from there, and do importance resampling.

trahflow · July 29, 2024, 7:49am

Hi @Gattu_Mytraya ,
are you tied to HMC, or could you use other MCMC techniques that are efficient?
If so, then geodesic slice sampling as detailed in this paper could be an option. The advantage is that this is really easy to implement and has no hyperparameters that need to be tuned.

ShaanK · July 29, 2024, 8:21am

@Gattu_Mytraya As pointed by @trahflow do checkout the geodesic slice sampling paper. If you are simply interested in it’s implementation, you could also try to adapt our python package for your usage: geosss. Particularly the GeoSSS (shrink) method is the fastest which is implemented under geosss.ShrinkageSphericalSliceSampler

Gattu_Mytraya · July 29, 2024, 10:09am

I am not sure if this would be possible. My PDF is defined in terms of Wigner-D matrices. So terms like \log(\cos(\theta/2) ...) would appear in the Hamiltonian.

Gattu_Mytraya · July 29, 2024, 10:14am

My main concern is just reducing the length of the Monte Carlo chains. For example, for about the same level of accuracy, HMC has a chain length 100 times less than what I would need in Metropolis-Hastings (on the plane R^{2} -
I expect similar behaviour on the sphere). I will try to look at the paper you are referring to.

Gattu_Mytraya · July 29, 2024, 7:28pm

@trahflow @ShaanK
So, I implemented the algorithm from GitHub - microscopic-image-analysis/geosss: Python package implementing ideal and shrinkage-based geodesic slice samplers defined on the n-sphere.. It works wonderfully. I admit I found the paper pretty intimidating. Is there an intuitive way to understand what’s going on?

Gattu_Mytraya · July 29, 2024, 7:46pm

The poles seem to be avoided. The expectation here is a constant density indicated by the horizontal line.

ShaanK · July 29, 2024, 8:10pm

@Gattu_Mytraya Glad to hear it works well! Yeah unfortunately the paper is sort of heavy with perhaps a lot of mathematical jargon. We are currently revising the paper and hopefully the next revision is more digestible for a larger audience.

P.S. The above plot is from this method? The algorithm is designed to choose geodesics randomly, so it’s a bit odd that it is missing poles. We have tested this on several challenging targets and never observed that. Checkout the numerical experiments section of the paper.

Gattu_Mytraya · July 31, 2024, 11:32pm

@ShaanK The plot was generated using the “Geodesic Spherical Shrinkage Slice Sampler.” However, the case mentioned above appears to be an outlier. For other distributions I tested, the density remained constant within the error bars.

Additionally, I drew approximately 25,000 samples per chain across 100 chains post-burn-in (~10,000 samples). In most cases, the Effective Sample Size (ESS) was around 250 per chain, which seems quite low compared to Hamiltonian Monte Carlo (HMC), indicating higher correlations within each chain. Is this expected?

ShaanK · August 4, 2024, 4:25pm

@Gattu_Mytraya Huh. That is a bit strange. Our numerical tests with mixture of von Mises-Fischer distributions indicated consistently higher ESS for our methods as compared to spherical HMC. Could you provide me with a toy simulation of your problem, that I could try out with minimal hassle and compare?

Gattu_Mytraya · August 4, 2024, 6:22pm

@ShaanK
So the simplest example is the following (log) pdf:

function u_v_generator(θ::Vector{Float64}, ϕ::Vector{Float64})

    return cos.(θ ./ 2) .* exp.(0.5im .* ϕ), sin.(θ ./ 2) .* exp.(-0.5im .* ϕ)

end
function logpdf(θ::Vector{Float64}, ϕ::Vector{Float64})
    U, V = u_v_generator(θ, ϕ)

    logpdf = 0.0
    for i in 1:length(θ)-1
        for j in i+1:length(θ)
            logpdf += 3.0 * log(abs2(U[i]*V[j]-U[j]*V[i]))
        end
    end

    return logpdf

end

The function I was checking the ESS for was the following:

function energy(θ::Vector{Float64}, ϕ::Vector{Float64})
    U, V = u_v_generator(θ, ϕ)

    energy = 0.0
    for i in 1:length(θ)-1
        for j in i+1:length(θ)
            energy += 0.50 / abs(U[i]*V[j]-U[j]*V[i])
        end
    end

    return energy

end

\theta and \phi are the positions on the sphere (\theta is the latitude and \phi the longitude)

ShaanK · August 6, 2024, 3:52pm

@Gattu_Mytraya Sorry for the late response. Looking at your code, I am not sure how this pdf lives on \mathbb{S}^2. From your logpdf function (or energy), every U, V corresponds to a 2-sphere, which is somehow coupled. However, it looks like it is some N product of \mathbb{S}^2. \mathbb{S}^2 \times \mathbb{S}^2 \times ... \mathbb{S}^2 which is some sort of a hypertorus (not sure though!)? I think this is not equivalent to \mathbb{S}^D where the “geodesic slice sampling on the sphere” can operate. Also not sure how you manage to sample with our method. For any general distribution, perhaps there are better alternatives than HMC, although I am not aware of those.

@trahflow Any ideas?

EDIT:

Apologies. You mentioned in your question that is (\mathbb{S}^2)^D (I missed it.). However this is not equivalent to \mathbb{S}^N where N is the dimension of the sphere. Our algorithm only works for this case. Sorry for leading a bit astray there. I hope you find a better solution to your problem.

Gattu_Mytraya · August 6, 2024, 4:30pm

@ShaanK Thanks for the clarification!

Do you think your algorithm could be adapted to my problem by picking a vector of geodesic directions? That is, picking a random vector in the tangent space of a point on (\mathbb{S}^{2})^{D} to move along. However, I am unsure how the shrinking algorithm (which I have been thinking of as a bisection method) could be extended in this case.

Or perhaps something like Gibbs sampling could be performed.

To clarify the language, I have D particles living on the sphere. So perhaps I can pick a particle at random and then move that particle on the sphere using your procedure.

ShaanK · August 7, 2024, 9:28am

Gibbs sampling might be a possibility for every \mathbb{S}^2 over D, but again that could make it inefficient and would not serve your purpose of reducing the computational cost.

Tamas_Papp · August 7, 2024, 11:30am

I don’t know anything about Wigner d-matrices, but all that is required for NUTS to work is a continuous embedding, ie an extension of your PDF to the neighborhood of the sphere. For example, if S is the sphere, and f(x) is your original log pdf for x \in S, an extension is

\hat{f}(x) = f(x / \| x \|) + g(\| x \|)

where \lim_{z \to 0} g(z) = -\infty to keep it away from the origin, g(1) = 0, and ideally g would fall quickly when away from 1, and be convenient for importance resampling.

(Note however that this is a theoretical scheme, whether you get efficient sampling depends on the details).

Gattu_Mytraya · August 7, 2024, 5:28pm

I will try to use a function of the form g(r) = a\log r - (r-\alpha)^2/(2\sigma^{2}) such that 1 = \alpha + a\sigma^{2} (i.e. this would ensure g(r) is peaked at r=1) and get back to you.

bertschi · August 7, 2024, 7:13pm

There have been a lot of (recurring) discussions on the Stan discourse about this issue, e.g., here or there etc.

Topic		Replies	Views
MCMC for a customized distribution General Usage monte-carlo	3	777	February 2, 2021
Hamiltonian Monte Carlo gets stuck at tiny local maxima Probabilistic Programming question	14	1116	March 18, 2024
Using DynamicHMC to calculate paths of Quantum Simple Harmonic Oscillator Modelling & Simulations dynamichmc	2	566	April 26, 2021
Discretized HMC Probabilistic Programming	1	380	April 25, 2021
MCMC landscape Statistics question	99	12986	July 2, 2021

Hamiltonian Monte Carlo on sphere

Related topics