How to best model different domains in Optimization, especially Manifolds?

In very many cases x is a metric or vector from a certain set – as in your three examples; but you are not allowed to add two points nor to scale one. Let’s again take the example of the 2-sphere (it’s just easier to visualise conceptually) as unit vectors in R3. Then \alpha x would not be on the sphere (unless \alpha = \pm1 and for two points x,y their sum is not of unit norm (besides a few very special cases).

Instead you have the (so-called) tangent space T_p\mathbb S – which for the 2-sphere is really. the plane tangent to a points, imagine these as “directions you can walk into”. And “walking” (in Euclidan space just x+v where v is any vector/direction to walk into) win the sphere means taking “the shortest path in that direction”, which is to follow great arcs. This is an example of the exponential map (which replaces +, but it uses two different objects – points and tangent vectors). And to already use a term from @mstewart – the exponential map might be expensive to evaluate, so one often uses approximations to these – the so-called retractions.
Locally both can be inverted (but usually not globally) and we get the logarithmic map (conceptually replacing minus) and inverse retractions (approximations to log).

Grassmann is indeed one of the conceptually a bit complicated but in practice still very nice manifold to work with numerically. But let’s start one step before. Stiefel(n,k) is the manifold consisting of all k-dimensional ONBs in \mathbb R^n These can be easily stored in a matrix (n x k) and we get the set of all matrices of that size which fulfils X^TX. Again, one can not add these matrices and stay on Stiefel, all arguments from before.

Now for Grassmann we do not care which basis we have, we are interested in the subspace. But We can still use X from Stiefel to represent the space. It is just that this is not unique. Any Y such that \operatorname{span}(x) = \operatorname{span}(Y) represents the same point on Grassmann – or in other words distance(M,X,Y)is 0 or isapprox(M, X, Y; atol=0) is true. But besides that this is a very concrete representation :slight_smile:

But in principle we still have nice matrices to represent the points (in a non-unique way), which works fine. So

this would work just fine :slight_smile:

But there is again a but. There is a second way (though not often used) to represent the space \operatorname{span}(X), namely by the projection matrix P \in \mathbb R^{n\times n}. So we use a type to distinguish them – where matrices are interpreted as being the Stiefel-representation – but for completeness we have a StiefelPoint and a ProjectorPoint type here. So to distinguish these, we would again need that x can be a struct.

Totally fine, I am very happy to explain as much as I can (and hopefully be a bit more knowledgeable about MOI/JuMP afterwards as well).

So you are looking for the closest point on the sphere to z ? This would be correctly written, but has a closed form solution (just normalise z)

Maybe not that easy to compute, since I would have to think a bit about the gradient of the objective, but besides that this looks fine. The only thing that would be complicated, if you either want to use StiefelPoint as the type of x (but luckily that is equivalent) – but worse if you want to represent X by projection operations. Then you would need a struct in the @variable.

1 Like