I have a PDF defined on (S^{2})^{D} that I want to sample from using Hamiltonian Monte Carlo (HMC). I need to calculate the expectation value of a function against this PDF, which is computationally expensive. Therefore, a smaller Monte Carlo chain (compared to Metropolis-Hastings) would be beneficial.
I can code both the PDF and its derivative. I aim to follow the geodesic Monte Carlo scheme described in section 3.1 of this paper while leveraging the NUTS implementations in DynamicHMC.jl or AdvancedHMC.jl.
I am just a mere optimisation person, so I have (upper bounded by) zero knowledge about Monte Carlo nor Metropolis Hastings (besides having heard the names), and even less about NUTS
But Manifolds.jl does offer a sphere M=Sphere(n) (see here of dimension n and you can also get the power manifold (vectors of values on the sphere) N = (Sphere(n))^m (see here). So with that you have a few tools they mention.
We recently started ManifoldDiffEq.jl (see Home · ManifoldDiffEq.jl) so maybe you can even use the Euler method we provide there.
Again, since I am not an expert, I am not sure how far you get with that, but maybe that is a starting point.
Can you extend your log posterior in a smooth way (continuous, at least once differentiable) to a small neighborhood around the sphere (ideally the whole Euclidean space that embeds it)? Then you can add a “penalty” term for the distance from the sphere surface, sample from there, and do importance resampling.
Hi @Gattu_Mytraya ,
are you tied to HMC, or could you use other MCMC techniques that are efficient?
If so, then geodesic slice sampling as detailed in this paper could be an option. The advantage is that this is really easy to implement and has no hyperparameters that need to be tuned.
@Gattu_Mytraya As pointed by @trahflow do checkout the geodesic slice sampling paper. If you are simply interested in it’s implementation, you could also try to adapt our python package for your usage: geosss. Particularly the GeoSSS (shrink) method is the fastest which is implemented under geosss.ShrinkageSphericalSliceSampler
I am not sure if this would be possible. My PDF is defined in terms of Wigner-D matrices. So terms like \log(\cos(\theta/2) ...) would appear in the Hamiltonian.
My main concern is just reducing the length of the Monte Carlo chains. For example, for about the same level of accuracy, HMC has a chain length 100 times less than what I would need in Metropolis-Hastings (on the plane R^{2} -
I expect similar behaviour on the sphere). I will try to look at the paper you are referring to.
@Gattu_Mytraya Glad to hear it works well! Yeah unfortunately the paper is sort of heavy with perhaps a lot of mathematical jargon. We are currently revising the paper and hopefully the next revision is more digestible for a larger audience.
P.S. The above plot is from this method? The algorithm is designed to choose geodesics randomly, so it’s a bit odd that it is missing poles. We have tested this on several challenging targets and never observed that. Checkout the numerical experiments section of the paper.
@ShaanK The plot was generated using the “Geodesic Spherical Shrinkage Slice Sampler.” However, the case mentioned above appears to be an outlier. For other distributions I tested, the density remained constant within the error bars.
Additionally, I drew approximately 25,000 samples per chain across 100 chains post-burn-in (~10,000 samples). In most cases, the Effective Sample Size (ESS) was around 250 per chain, which seems quite low compared to Hamiltonian Monte Carlo (HMC), indicating higher correlations within each chain. Is this expected?
@Gattu_Mytraya Huh. That is a bit strange. Our numerical tests with mixture of von Mises-Fischer distributions indicated consistently higher ESS for our methods as compared to spherical HMC. Could you provide me with a toy simulation of your problem, that I could try out with minimal hassle and compare?
@ShaanK
So the simplest example is the following (log) pdf:
function u_v_generator(θ::Vector{Float64}, ϕ::Vector{Float64})
return cos.(θ ./ 2) .* exp.(0.5im .* ϕ), sin.(θ ./ 2) .* exp.(-0.5im .* ϕ)
end
function logpdf(θ::Vector{Float64}, ϕ::Vector{Float64})
U, V = u_v_generator(θ, ϕ)
logpdf = 0.0
for i in 1:length(θ)-1
for j in i+1:length(θ)
logpdf += 3.0 * log(abs2(U[i]*V[j]-U[j]*V[i]))
end
end
return logpdf
end
The function I was checking the ESS for was the following:
function energy(θ::Vector{Float64}, ϕ::Vector{Float64})
U, V = u_v_generator(θ, ϕ)
energy = 0.0
for i in 1:length(θ)-1
for j in i+1:length(θ)
energy += 0.50 / abs(U[i]*V[j]-U[j]*V[i])
end
end
return energy
end
\theta and \phi are the positions on the sphere (\theta is the latitude and \phi the longitude)
@Gattu_Mytraya Sorry for the late response. Looking at your code, I am not sure how this pdf lives on \mathbb{S}^2. From your logpdf function (or energy), every U, V corresponds to a 2-sphere, which is somehow coupled. However, it looks like it is some N product of \mathbb{S}^2. \mathbb{S}^2 \times \mathbb{S}^2 \times ... \mathbb{S}^2 which is some sort of a hypertorus (not sure though!)? I think this is not equivalent to \mathbb{S}^D where the “geodesic slice sampling on the sphere” can operate. Also not sure how you manage to sample with our method. For any general distribution, perhaps there are better alternatives than HMC, although I am not aware of those.
Apologies. You mentioned in your question that is (\mathbb{S}^2)^D (I missed it.). However this is not equivalent to \mathbb{S}^N where N is the dimension of the sphere. Our algorithm only works for this case. Sorry for leading a bit astray there. I hope you find a better solution to your problem.
Do you think your algorithm could be adapted to my problem by picking a vector of geodesic directions? That is, picking a random vector in the tangent space of a point on (\mathbb{S}^{2})^{D} to move along. However, I am unsure how the shrinking algorithm (which I have been thinking of as a bisection method) could be extended in this case.
Or perhaps something like Gibbs sampling could be performed.
To clarify the language, I have D particles living on the sphere. So perhaps I can pick a particle at random and then move that particle on the sphere using your procedure.
Gibbs sampling might be a possibility for every \mathbb{S}^2 over D, but again that could make it inefficient and would not serve your purpose of reducing the computational cost.
I don’t know anything about Wigner d-matrices, but all that is required for NUTS to work is a continuous embedding, ie an extension of your PDF to the neighborhood of the sphere. For example, if S is the sphere, and f(x) is your original log pdf for x \in S, an extension is
\hat{f}(x) = f(x / \| x \|) + g(\| x \|)
where \lim_{z \to 0} g(z) = -\infty to keep it away from the origin, g(1) = 0, and ideally g would fall quickly when away from 1, and be convenient for importance resampling.
(Note however that this is a theoretical scheme, whether you get efficient sampling depends on the details).
I will try to use a function of the form g(r) = a\log r - (r-\alpha)^2/(2\sigma^{2}) such that 1 = \alpha + a\sigma^{2} (i.e. this would ensure g(r) is peaked at r=1) and get back to you.