Need help on Manopt

Hello

I must find a maximum likelihood estimator for a distribution I have designed. But, I can’t have an explicit expression for it. Looking at another thread: Maximum likelihood for t-distribution by gradient descent, I see that Manopt.jl could be the package I need. I am a statistician but I have little knowledge on manifolds, sorry if my questions may seem trivial.

Question 1- The parameter of the likelihhod function is a simple probability vector (this is the variable on which I need to maximize). Am I correct to assume that the ProbabilitySimplex is the manifold I’m working on?

Question 2- I have an explicit formula for the derivative of the likelihood on each member of the parameter vector. But, as I read the documentation, I see there is something like a “Riemannian” gradiant as opposed to an “Eucllidian” gradiant. Is the euclian one the drerivatives? Then what is the Rieman one? It seem like the projection on the subspace orthogonal to p (in my case anyway). Is that correct? If true, no problem I can program that. Please confirm I simply need to do this:

    X = p ⊙ Y - ⟨p, Y⟩p,

Question 3- There seems to be a magic change_representer() that will also change my gradiant to a riemannian type. But the example is confusing me. It asks for an AbstractMetric and a representer of a linear function. I am lost! I have stopped looking further for fear of wasting my time. How should I use this? Should I?

Question 4- I have looked at a few exemples and the functions to minimize and the gradiants have the manifold as an argument but don’t seem to use it. For instance: g(M, p) = log(det(p))^4 or grad_g(M, p) = 4 * (log(det(p)))^3 * p. Is M really needed? Not that I mind adding this in my function definition but I’d like to understant why.

Question 5- There are many algorithm (solvers) available to use in Manopt.jl. How can I chose the best one? I don’t care about speed, I can go make myself a coffee meanwhile. It’s going to be faster than my image processing stuff anyway. The likelihood I am trying to maximize is relatively simple and uses a table that I keep as a global dictionary. The gradiant is not much fancier. Simple brute force stuff. Where can I find a discussion on the best solver?

Question 6- I don’t care about speed but I work with BigFloats (precision maniac). Do I have to revert to Float64?

Question 7- I have not yet worked on speed improvement with multithreading. I assume this would not cause any problem when if the likelihood and the gradiant are computed with several threads. Correct? Have you used it with MPI? I may have to resort to that.

I was very impressed by the flexibility of the package. Great work.

I am not a statistician, so bear with me if I get some statistics wrong

As far as I understand, yes. Keep in mind that Manopt.jl like most optimization packages minimizes functions, so you have to rewrite your maximisation of (say the function) f into minimisation of -f.

This section describes the mathematics between Euclidean and Riemannian gradients. For manifolds that inherit the metric from the embedding – e.g. on the sphere – it is really just a projection,
if the metric is different, you also have to use change_representer (after projection), but this is already available in Manifold.jl see Probability simplex · Manifolds.jl

Which example? The AD chapter above gives you the derivation of Riemannian gradient (from the Euclidean one).
change_representer(Manifold, Metric, point, gradient) needs the following information

  • A manifold you are on
  • the metric in the embedding, which in your case (and very often) is EuclideanMetric()
  • the point you are at
  • the tangenent vector (evaluated gradient, the Euclidean one) you want to change.

this function does what you aim to implement in Q2, so if you want to implement that yourself (i.e. implement the Riemannian gradient yourself) – sure that is also a way to go.
We plan to make all this |a bit easier soon](Implement a ManifoldEuclideanGradientObjective · Issue #217 · JuliaManifolds/Manopt.jl · GitHub), such that you really just have to provide a Euclidean gradient, but since I am the only main developer of Manopt.jl there is limits to how fast I can realise ideas

Think of the manifold as the domain for the function. By convention all functions in Manopt.jl, e.g. the cost function, the gradient, the Hessian, or proximal maps, have the manifold as their first parameter.
The simple reason is, that you can use exp(M,p,X) (the exponential map) or the distance independent of your manifold. For example in the starting tutorial you can just take f and grad_f and even use them to compute the Riemannian center of mass on any manifold you like (you just have to provide your own data vector).
So the manifold as first argument is just a convention in Manopt.jl which eases definition of functions.

That depends tremendously on (a) information on your optimisation problem available and (b) the manifold you are on.
For example

  1. You just have a cost function, but no gradient available. Then you can (a) use finite differences to approximate the gradient and use gradient_descent or (b) use something like particle_swarm to not use a gradient, but (b) is usually much much slower
  2. If you have a gardent, then something like quasi_newton is probably even faster than gradient_descent, but it for example requires a parallel transport (or more general vector transport) being implemented on your manifold. This means, that you have to have a way to “move tangent vectors” to a new point.

the whole JuliaManifolds org tries to be agnostic of the type you prefer to use. the points p and tangent vectors X are usually never typed, which allows you to take BigFloats if you like. The only limit sometimes might be that internally used functions to not support that (that is something from LinearAlgebra for example) – but until now I am not aware of anything breaking. The same for StaticArrays if you want to use those instead.

We have not yet checked with MPI, but the answer is the same as in 6 – by design it should basically just work. Compared to MPI I would first care a bit more about the algorithm in Q5, though, before entering MPI. If you want to try MPI and run into problems – feel free to get in touch. If you have a nice working example with MPI we would be interested as well.
The plain reason we did not yet try that ourselves, is that in both Manifolds.jl and Manopt.jl there is still enough manifolds and algorithms left to cover as well as user-experience improvements.

Thanks :slight_smile: If you feel the docs could be improved to answer your questions directly, let me know.

2 Likes

The short version of Riemannian gradients is that you just compute the Euclidean gradient, and use riemannian_gradient(M, p, X), where M, p and X are what Ronny described. There should be no need to delve into more details, and you shouldn’t need to use change_representer directly.

There are a few functions that don’t work with BigFloat. Some of them just require loading GenericLinearAlgebra.jl. The only thing I encountered where that wasn’t enough was for cases where matrix exponential and logarithms need to be computed. I have a small script that fixes it. I didn’t add it to Manifolds.jl because it requires another dependency. Generally, if anything doesn’t work with BigFloat, feel free to just open an issue.

2 Likes

I was a bit unsure we are exporting it already :wink:

@kellertuer , @mateuszbaran
Thanks a lot for your help. I think I should be able to complete my work with your information.

Before I close the task and indicate it as solved, I have a last question.

Could you recommend me an undergraduate level book on manifolds? The topic looks facinating and I’d like to dig more into it. Please, no URLs, I’m too old for that, I need ink and paper stuff.

I found a few minor typos in both Manopt and Manifolds documents. If you wish I can proof read them and let you know.

It depends a bit what your goal is, usually undergrad is a bit early to cover all the involved details of manifolds, differential geometry and optimization.

My recommendations are

or for (just) differential geometry

We try to document every keyword and when we have discussion usually turn them into some notes or tutorials, but sometimes it is then hard to proofread them if you have thought about something for a while – and later one forgets to proofread again.
So any PR or note on typos is always welcome.
And as I said if you feel some tutorial would help you, we can also work on that.

2 Likes

Great.

Riemanian Geometry is what I am looking for.

No, what is usually referred to as the “likelihood” is the conditional probability p(D | \theta), but for fixed data D, i.e., considered as a function of the parameter vector \theta. Thus, the space/manifold of the sampling space lives plays no role, instead the structure of the parameter space matters. Usually this is a product of several different manifolds, e.g.,

  • Euclidean for unconstrained parameters such as location \mu
  • SymmetricPositiveDefinite for covariance matrix parameters \Sigma
  • PositiveVectors e.g. for the concentration parameter \alpha of a Dirichlet distribution

In principle, Riemannian gradient descent taking the manifold structure of parameters into account should be better. In practice, it might be easier to use an explicit transformation from an unconstrained \mathbb{R}^n space onto the parameter manifold and use standard Euclidean optimization on that space instead. TransformVariables.jl has several transformations predefined and also provides their log-Jacobian corrections (even though these can be ignored when doing maximum likelihood estimation).

1 Like

Hello @bertschi

I must not have express myself correctly. Everything you wrote is correct and it is exactly what I am doing. My parameter is a vector of positive numbers summing to 1, like a multinomial (but it is not). The sample is a vector of observed frequencies.

I do not assume any prior distribution for the parameters, I want to maximize the likelihood of the sample. The classical way (no prior Dirichlet or other). I’m not a fan of bayesian statistics, but that’s another story.

So we are saying the same thing.

Next step: convergence and asymptotic distribution. The hard part.

These are very good recommendations. I would just like to add that information geometry explores many interesting connections between statistics and geometry. I like this book: Information Geometry | SpringerLink .

1 Like

Must have missed that your parameter lives on a simplex itself. Then, you can indeed either try Riemannian optimization or the UnitSimplex transformation.

1 Like

OK. I will receive my book on Manifolds next week and read it first. I’ll keep your recommendation for later. It looks interesting indeed.

Which book did you decide on getting?

Lee, Introduction to Smooth Manifolds. After reading review of Do Carmo saying that the book assumed some prior knowledge on manifolds. I decided to settle for an introduction first. I’ll see after. I am pretty sure I can find the book at the library of the university. But the book on Information Geometry will probably grab my attention after.