I have an optimisation problem which I solve with Optimization.jl using IPNewton(). In order to reduce the space of solutions to a more physically reasonable set, I want to introduce some constraints. I do this using the cons keyword, but I get errors which suggest that the eigenvalue decomposition in the constraining function does not work with AutoDiff. I was wondering, is there a restriction as to which functions can appear in the constraining function ? Or perhaps this is an IPNewton() specific problem ? If so, would the good folk here have other suggestions for solvers which can take moderately complicated constraints ?
Thanks. The constraining function is quite complex, and there is no clear closed-form Jacobian or Hessian. Do you know any examples where AutoDiff libraries are used instead ?
I am solving a simple problem: given a set of points {xi} in Rn, to fit a hyperellipsoid E(mu,A) which minimises a distance metric d(x,y). For now, I am using square-Euclidean distances.
I have split the problem into two sub-parts.
Part 1: Finding the distance d(p,E) of a given point p from a given ellipse E(mu,A). This requires nonlinear constraints, and I solved it by minimising d(x,y) with a constraint that y lies on E. I used IPNewton() and AutoForwardDiff() to achieve this.
Part 2: Finding {mu,A} such that the ellipse E(mu,A) minimises the sum of d(xi,E) over i. I have also been able to solve this by using a transform t: A → u (using some properties of the hyperellipsoid), and then using NelderMead() without constraints.
Problem: In Part 2, I wanted to employ some constraints which reflect the physical problem that the system describes and restrict the solution space of E. These constraints are on a function g(A), and not u. My plan was to define a constraining function which employs t_inverse: u → A, and then to use IPNewton() with AutoForwardDiff() as in Part 1. However, I am finding that IPNewton() fails, complaining about the lack of an eigen!() which accepts AutoDiffs (this is my interpretation of a very long and noob-unfriendly error message).
Is that helpful ? I am happy to explain further.
What was supposed to be a quick-and-dirty week-end, first pass over the data has turned into a rather involved module which fits hyperellipsoids. I think there is enough demand in my field for this that I want to release it as a small module that experts can modify, optimise, and hopefully help make faster.
Thanks. You wrote u a couple of times. Did you mean \mu?
Why is it problematic that g is a function of A? A are optimization variables, right?
Maybe you can write your problem in mathematical notation like this:
See page 244 of A Ben-Tal, A Nemirovski for a formulation that gives the smallest volume ellipsoid.
These types of problems can be solved with JuMP. You can see a similar type of worked example in the Minimal ellipses tutorial.
@cvanaret Ooh, I didn’t know you could write notation on discourse!
Right, so, Problem 1:
Given:
A point p \in \mathbb{R^n}
The surface of an ellipse: E(\mu,A) \equiv \{ x\,|\,(x-\mu)^TA(x-\mu) - 1 = 0 \}
A distance metric d(x,y), currently, d(x,y) = \sum_i (x_i-y_i)^2.
Find: d_{min}= \underset{x}{\mathrm{min}} \,\, d(p,x) , s.t., x \in E(\mu,A)
We denote this d_{min} as d(p,E) for a given point p and an ellipse E.
Problem 2
Given:
A set of points {x_i} \in \mathbb{R^n}
d(p,E) as defined above.
Find: {\mu_{opt},A_{opt}} = \underset{\{\mu,A\}}{\mathrm{argmin}} \,\, \sum_i {d(x_i,E(\mu,A))} s.t., l < g(A) < u.
The problem is, the constraint on g(A) defines limits on the eigenvalues of A, and the only way I am able to implement this is by using a constraint function that does an eigen-decomposition - which is where the error comes in. Happy (and very thankful) to take any advice from optimisation experts! I can also share access to a private repo.
Thanks!
PS: The text-editor on discourse is absolutely fantastic - I wish all sites could have something like this!
Thank you! I had looked at the example (the code, not the actual book). I will have a look at the book-chapter as well and see if I can get some inspiration from there. My problem (I describe it in detail in an answer to @cvanaret) is a bit more involved - but maybe it can be reduced to a minimum ellipse problem.
And I tried JuMP and found it to be a bit unintuitive for me. I like working with the code for the actual computation - probably due to my lack of experience with symbolic programming.
I think for an ellipse it has to be symmetric positive definite by definition since it needs to be diagonalisable (sorry if this is theoretically incorrect!).
For Problem 2, I am using this fact so that my optimisation is on the Cholesky decomposition of A, which helps me reduce the number of variables by a little.
The constraint: The diagonisation, say, A = UDU^T then implies that the rotation is encoded in U, and the scaling in a diagonal matrix D. My constraints are on the individual scaling factors in D - I try to ensure that the inverse of each diagonal element r_{ii} = 1/d_{ii} lies between r_l and r_u, where r_l and r_u are the extrema of the pairwise distances of elements in the set {x_i}. This is the reason I need to do an eigen-decomposition of A (of course, I first compute A from the Cholesky lower-triangular matrix L as LL^T).
Thanks for the details. An immediate (but probably dumb) idea would be to form A “constructively”: optimize wrt the d_{ii} \in [\frac{1}{r_u}, \frac{1}{r_l}] and build A = UDU^T. I’ll think about it some more
@jd-foster This is awesome, thank you very much! I will try to see if I can adapt your very clear example to fit my needs. In my case, the problem is to find an ellipse E(\mu,A) such that the mean distance (squared-Euclidean for now) of its surface/boundary from the set \mathcal{S} is minimised. In addition, I also have constraints on the size of the fitted ellipse which relate to the eigenvalues of A.
I tried to put it in some sort of mathematical notation in my response to @cvanaret above. I will probably release a very simple toolbox at some point this weekend, and would appreciate any constructive comments from experienced optimisation folk !
Thank you very much! I hope to be able to share some code this weekend, so any comments or ideas will be very welcome.
I think the model knows about the symmetric PSD requirement because I construct the ellipse using the Cholesky decomposition. So the ellipse is actually given by:
E(\mu,L)≡ \{\, x\,\mid\,(x−μ)^TLL^T(x−μ)−1=0\,\}, where L is a lower-triangular matrix.
I write it using A=LL^T because A gives me parameters which are easy to map onto the data (like the eigendecomposition).
I am not sure this helps too much, but if you are optimising over SPD matrices (kind of like a constraint) you might also take a look at Home · Manopt.jl and Home · Manifolds.jl – but it might also just be me working with these and seeing manifolds everywhere. In this case I think its Symmetric positive definite · Manifolds.jl, which stores the full matrix but the manifold dimension is reduced the same way as your Cholesky decomposition also provides.
For example the ellipsoid we do not (yet?) have in Manifolds.
OOh a very nice approach! I have been itching to find ways to use Manifolds.jl in my work I will try to see if I can use the SPD manifold to frame my problem. I need to do a first pass over the data very quickly so perhaps I will stick to the descriptive model I have right now, but I think your approach could be very elegant, thank you ! I’ll definitely be playing around with it.
PS: I think we might have briefly exchanged emails over a problem about projecting dynamical trajectories on optimal manifolds (Manifolds.jl was not the right package for the problem). Great to hear from you again!