I must find a maximum likelihood estimator for a distribution I have designed. But, I can’t have an explicit expression for it. Looking at another thread: Maximum likelihood for t-distribution by gradient descent, I see that Manopt.jl could be the package I need. I am a statistician but I have little knowledge on manifolds, sorry if my questions may seem trivial.
Question 1- The parameter of the likelihhod function is a simple probability vector (this is the variable on which I need to maximize). Am I correct to assume that the ProbabilitySimplex is the manifold I’m working on?
Question 2- I have an explicit formula for the derivative of the likelihood on each member of the parameter vector. But, as I read the documentation, I see there is something like a “Riemannian” gradiant as opposed to an “Eucllidian” gradiant. Is the euclian one the drerivatives? Then what is the Rieman one? It seem like the projection on the subspace orthogonal to
p (in my case anyway). Is that correct? If true, no problem I can program that. Please confirm I simply need to do this:
X = p ⊙ Y - ⟨p, Y⟩p,
Question 3- There seems to be a magic
change_representer() that will also change my gradiant to a riemannian type. But the example is confusing me. It asks for an
AbstractMetric and a representer of a linear function. I am lost! I have stopped looking further for fear of wasting my time. How should I use this? Should I?
Question 4- I have looked at a few exemples and the functions to minimize and the gradiants have the manifold as an argument but don’t seem to use it. For instance:
g(M, p) = log(det(p))^4 or
grad_g(M, p) = 4 * (log(det(p)))^3 * p. Is
M really needed? Not that I mind adding this in my function definition but I’d like to understant why.
Question 5- There are many algorithm (solvers) available to use in Manopt.jl. How can I chose the best one? I don’t care about speed, I can go make myself a coffee meanwhile. It’s going to be faster than my image processing stuff anyway. The likelihood I am trying to maximize is relatively simple and uses a table that I keep as a global dictionary. The gradiant is not much fancier. Simple brute force stuff. Where can I find a discussion on the best solver?
Question 6- I don’t care about speed but I work with BigFloats (precision maniac). Do I have to revert to Float64?
Question 7- I have not yet worked on speed improvement with multithreading. I assume this would not cause any problem when if the likelihood and the gradiant are computed with several threads. Correct? Have you used it with MPI? I may have to resort to that.
I was very impressed by the flexibility of the package. Great work.