How to best model different domains in Optimization, especially Manifolds?

Recently there have been some approaches here and here to introduce (my favourite topic) optimization on manifolds into general optiimization frameworks.

One thing, that might be missing and that I would like to discuss is:

How should we best model the case where the domain is a manifold?

The usual case JuliaManifolds is considered with, is a manifold given by some representation in an explicit way. In this case probably a dependency on ManifoldsBase.jl would resolve most requirements.

On the other hand there might also implicitly given manifolds, for example a constraint to a surface or some other (in)equality constraint set.

I think this might affect

and maybe even single solver packages like

  • Optim.jl (which has some support for manifolds)
  • maybe also others that think about providing such support, if they want to provide something special in these cases?

For me personally in Manopt.jl just having the manifold passed around as well is fine, but maybe there is a nicer model / way to handle this in the “global optimization infrastructure”?
This might also involve discussion representation of points on the manifold and tangent vectors (“directions to walk into”), since they might not always be “plain arrays” as the classical (i.e. Euclidean) optimization often implicitly assumes.

So any ideas on how to best model that and where to provide such a model would be very welcome!

5 Likes

I think a general way to define a manifold that is then either used to automatically define constraints or do something smarter is something Optimization.jl needs to do. We will follow your lead on this one.

2 Likes

I would definitely like to continue with my PR to Optimization.jl but at the moment I’m a bit stuck on making the library manifold-aware, i.e. I think ideally there should be manifold constraints that are solver-independent, and mapped to each solver either as a manifold or as implicit constraints. It would be easier if all optimization libraries that deal with manifolds used the same manifold API but we are not there currently. I was thinking about proposing a switch in Optim.jl but that would be a breaking change so it would add work for many people, most of whom likely wouldn’t care about new functionality. On the other hand Optimization.jl can handle incompatibilities between libraries, which wouldn’t be breaking but may be a bit messy. I would like to have some input from maintainers of Optimization.jl before continuing.

1 Like

For now I just had the chance to discuss with @mateuszbaran.

We for now propose the following:

For embedded manifolds, that is manifolds, that are defined and represented in some embedding, we will for now consider special constraints. This would be the case for example for the sphere which is embedded in its surrounding Euclidean space.

This

  • covers already many manifolds
  • for now avoids considering special domains
  • one could “translate” between (classical) constraints and manifolds – at least from manifolds to classical, maybe even back

The domain question here would extend this picture to all manifolds – and for the embedded ones provide a second way of defining optimization problems. However, this would also mean to probably redesign at least packages like Optimization.jl and MathOptInterface.jl to allow for specifying a domain.

I will think about such a domain idea, but with the constraint idea/appraich from Mateusz we should be able to a first very good step.

1 Like

However, this would also mean to probably redesign at least packages like Optimization.jl and MathOptInterface.jl to allow for specifying a domain.

The standard form of MathOptInterface is:

min f_0(x), subject to f_i(x) in S_i, where f_i are scalar or vector-valued functions and S_i is a set.

We don’t really explicitly state what the underlying space is, but I guess it’s assumed that x is a variable in Euclidean space.

The MathOptInterface approach would be to define a manifold/domain as a set, and then let the optimizer deal with the underlying interpretation. This is already implemented in a generic way, so for example, you can do this:

julia> using JuMP

julia> struct Sphere <: MOI.AbstractVectorSet end

julia> MOI.dimension(::Sphere) = 3

julia> model = Model();

julia> @variable(model, x[1:3] in Sphere())
3-element Vector{VariableRef}:
 x[1]
 x[2]
 x[3]

julia> print(model)
Feasibility
Subject to
 [x[1], x[2], x[3]] ∈ Sphere()

You could think of @variable(model, x[1:3]) as short-hand for @variable(model, x[1:3] in Reals(3)) and @variable(model, x[1:3] >= 0) as short-hand for @variable(model, x[1:3] in Nonnegatives(3)).

It’d probably help the design if you can up with a few toy examples with your best-case user-interface. (Ignore how hard it is to implement in Julia, or how compatible it is.) Then we could see where to meet in the middle. At the moment I don’t fully understand what you mean by constraints on an arbitrary manifold.

2 Likes

Thanks for your detailed answer.

Constraints

I think the main challenge is really, that in many places the x is assumed not only to be Euclidean, but to be <:AbstractArray – or informally, that + can be used.

Example 1: For fixed rank matrices, though one could store the matrix itself, it is much (much much) more efficient to store its SVD, both for memory as well as time consumption. But then x is not from (one) Euclidean space

Example2: On the sphere, though that is a subset of its embedded Euclidean space, we can take two points p,q of unit norm but then r=p+q is not of unit norm.
But this might also be something one still can specify in constraints.

So here I think it might be challenging to find all the places where “Euclidean-ness” is assumed implicitly (I am surprised myself every now and then even in theory).

And sure, if we have an embedded manifold, both using constraints (and Augmented Lagrangian and such) is equivalent to the manifold idea, and even to the case that it might depend on the application, which one is better (I am not claiming manifolds solve everything better :wink: ).
And for embedded ones starting with thinking of constraints might be ok, but one thing that does change is, that intrinsically due to thinking in manifolds one is always feasible. Or: You never leave the manifold (for example since you exchange + with the exponential map).
But this different could – you are right – still be handled with the right set-constraint and then using Manopt.jl as a solver and not a Euclidean one with a constraint solver.

Also the constraint approach might resolve most manifolds, so that is why I would even propose to start with that.

Non-embedded Manifolds

The other is, that if we start on the lowest level, the set M (that we would equip with a smooth atlas and a Riemannian metric to get a manifold) does not need to be a subset of an Euclidean space, so additionally to the challenges mentioned above.
Hm, A concrete example is maybe a bit harder to come by.

So first imagine that the concept of “global constraints” (like the norm for the sphere) are not necessary, it is ok to have constraints based on “where you are” on the manifold, as long as the manifold dimension stays constant (that is the “directions to move into” or more precisely the Euclidean space this is locally isomorphic to). This could probably still be captured by point in Manifold, but we surely are not necessarily able to write p in (global) coordinates like your x in the code.
That is maybe the main thing I can see code-wise. We do not have global coordinates p[1], ..., p[n] to work with for a non-embedded manifold.

We might even for now not necessarily have an example implemented, but maybe conceptually in code imagine that p is of a type that is just a struct, that internally does not consist of a (set of) matrices (I think until now all our points are at least a set of matrices/vectors).
Then the check p in M might consist of “finding the area p is in”, check the local constraint of said area of the manifold.

Sorry that I can not be more concrete here, if I have a concrete example I definetly mention it :slight_smile:
But in these cases

  1. You can not narrow p down to something like your x where you can constraint the first 3 entries. This is also already the case for Fixed-rank matrices · Manifolds.jl the SVD point we use on FixedRankmatrices – but sure there we can map the internal 3 matrices into a large vector if necessary
  2. You can not specify a “global” constraint, i.e. one (set of) functions f_i that hold for all p

The closest we currently might get to an example is probably the Shape space: All smooth diffeomeorphisms of the sphere, which is an infinite-dimensional manifold; but we also have not yet explored all implementation details of this manifold.

Notation / Proposal

Overall – conceptually – I would really prefer a notion

@domain (model, Manifold)

which in many cases could be equivalent to some constraints.
The default is, and that (often not explicitly given) Euclidean(n), where n is the number of variables, since this is your implicit assumption anywhere.

I would propose to start with the constraint idea, since that (with a proper map to the manifold when calling things in manifolds) might be nice.
Later one could have a domain-to-constraints “translator”, when such translation makes sense (e.g. for the sphere).

That would be my proposal (without knowing all details of MOI).

Sorry this post got a bit long, but I hope it describes the differences between domain and constraint.

Immediate questions:

  • Can I have a collection of variables belonging to one manifold and another collection belonging to a different one?
  • Is the manifold a property of the model or the variables?

Perhaps some other examples from JuMP

model = Model()
@variable(model, X[1:3, 1:3], Symmetric)  # X belongs to the space of real-valued symmetric 3x3 matrices
@variable(model, X[1:3, 1:3], PSD)  # X belongs to the space of real-valued symmetric 3x3 matrices that are positive semi-definite
@variable(model, X[1:3, 1:3] in HermitianPSDCone())  # X is a complex-valued matrix that belongs to the space of Hermitian 3x3 matrices that are positive semi-definite

You could imagine

@variable(model, X[1:3, 1:3] in FixedRankMatrices(2))  # Space of matrices with rank two

@blegat want’s to add something like this: Add set for low-rank constrained SDP by blegat · Pull Request #2198 · jump-dev/MathOptInterface.jl · GitHub

It’d probably help the design if you can up with a few toy examples with your best-case user-interface.

@domain (model, Manifold)

I guess I meant more concrete. What is model? What is Manifold? What are the decision variables. What is the objective function?

I was really looking for something like

model = Model(Manopt.Optimizer)
@variable(model, x in Sphere())  # Is x a vector? A struct?
@objective(model, Min, norm(x))
optimize!(model)
value(x)

Do you have an example that can’t be represented using some variation of the current JuMP syntax?

1 Like

I see. At a few points I think we have to check again, what we speak about, since we have different terminology I think

That depends quite a bit on how you want to interpret it. In Manopt (besides one solver that needs two manifolds in its design) all solvers work on exactly one manifold, but there is product manifold M1 × M2 and its special case power manifold M1^n. The last case for example would be a vector of points all living on the same manifold.

That is a very complicated question, because I think the answer is: maybe both?
If you consider the “set-part” of the manifold as being a constraint for the variables, but on the other hand, you have different choices of metrics per manifold – informally: different ways to define a distance and this induces the “way to walk around” (or which exponential map to use)

This would more like be the part of the variables and I would be fine if this also works in a “domain way” and not just a “constraint way” – so maybe my domain idea might also be covered by this variable an in if that is not meant in a constant way.
This looks nice and would be fine for me then.

Those are good questions, I am not sure what a model for you is and I usually do not think in “models”, I just see that MOI uses models all the time. Sorry for not reading up on that.

  1. Manifold would be any subtype of AbstractManifold at least in the JuliaManifold-world
  2. I probably meant mainly the objective to be contained in the model, but maybe also all its derivatives or proximal maps or other tools that help the solver.

In Manopt.jl we do not use the idea of a model, we just have the objective (as described in 2), which together with a manifold (in the idea of Set+Metric represented by a concrete subtype of AbstractManifold) forms a Problem. On the other hand there is a SolverStatewhich specifies which solver with which settings to use. So maybe from the code you post, in Manopt the tuple(problem, solver_state)` would be a model?

Maybe as a counter question, since the JuMP docs are just huge and I could not yet find a good intro to all the language/terminology used: What is a model in MOI/JuMP? Maybe it is also just what I call a problem? Then the manifold would be part of the model, since changing the metric for example would change the problem but not the solver to use.

x could be anything. IN Manifolds.jl we have the usual convention: If there is a reasonable representation in a single array, then arrays can be used to represent x. It doesn’t make much sense to introduce a SpherePoint for example. If there is more than one representation (on Hyperbolic we have 3 vector-like representations: Hyperboloid, Poincare-Halfplane, Poincare-Ball), then one (Hyperboloid) is equivalent to representing stuff in arrays (i.e. the documented default) the others are subtypes of AbstractManifoldPoint and the same for tangent vectors (informally “directions to walk into”) and TVector.

I am honestly not sure. I tried several times to read up on how JuMP is meant to work to specify stuff, in 2016 I even thought to base Manopt on that model, but I always get lost and never find the right terminology, I fear. To me JuMP is a completely different language than what I am used in terminology on Optimization and I often feel very lost in the docs. Don’t get me wrong, it’s great that really a lot of things are documented, but I seem to “grown up” in a different Optimixation Community than what JuMP is meant for.

But even better that we have this thread now :slight_smile: Good time to try again!

From what I see in the current PR in Manopt, JuMP/MOI can only handle vectors as variables. That would mean, Hyperbolic would only work with the hyperboloid model for example

Sorry to derail a bit, but @mohamed82008 could ImplicitDifferentiation.jl be used here?

I don’t think so. The problems discussed here are way more primitive:

  • What’s a model?
  • What’s a decision variable?
  • What’s a constraint?

Oscar asked for a concrete example to build the abstraction back from. I don’t think a concrete example was given yet. I do not understand manifold optimisation enough to contribute much to this discussion. But here is what I am gathering from this discussion so far.

It seems that manifold optimisation assumes the decision variable is an arbitrary object with different “grammar”, e.g. + and * mean different things or are not allowed. This calls for more specialised or generalised optimisation algorithms that work with such new grammar, than are commonly used for regular (constrained) optimisation on Euclidean spaces. Can one software “just work” on both constrained Euclidean and manifold optimisation problems? Perhaps, but the value proposition seems somewhat weak so far. What problem will this unification enable solving that the current ecosystem does not solve? Why not just convert the problem from the manifold language to the constrained Euclidean language using a convert-like function instead of having one package for both? If we can’t answer these questions, I don’t think it’s worth spending time implementing a single unified optimisation package for both. But I am just a random guy on the internet so you can do whatever you want of course.

2 Likes

Worse :wink: for the “decision variable” (for me just variable), * is not defined usually.

I think the general thing this should enable is

  1. Constraint problems can sometimes be rephrased on manifolds, which might be more efficient in some cases – but being able to test that easily is the goal here
  2. making Manifold optimization more visible (without reading a whole Differential Geometry book upfront).

But besides that also that a general framework like MOI/JuMP or Optimization.jl should maybe also cover such special optimisers – at least I would be happy if they could.
For implicitly defined manifolds for now, not even the code for the manifold exists, so I do also think implicit differentiation might not be that fitting.

And sure, at least I am still on the level checking what the single terms are and how they do/could/should/can fit to this generalization to Manifolds (generalisation, because one can always define a problem on Euclidean(n) and have a “classical, Euclidean” problem)

I think part of the problem of this discussion is mixing multiple levels of manifold abstractions. Most (if not all) manifold-based approaches are applicable to plain Euclidean optimization with certain constraints, and for some problems they are state-of-the-art. I agree that the value of extending general optimization APIs to more abstract optimization domains is somewhat unclear but hooking up solvers that exploit manifold constraints to Euclidean APIs makes sense IMO. Optim.jl does that already with its manifold type and it enables some additional opportunities for the solver. JuMP also has some structured constraints that are exploited by solvers. IMO this discussion is mostly about figuring out if there can be a common manifold constraint representation between Optim.jl/Manopt.jl/JuMP.jl/etc. that works in those Euclidean-with-constraints cases. Mostly to avoid re-implementing advanced solvers for each library.

Optim.jl could actually easily switch to JuliaManifolds abstractions and get support for more manifolds that way but I didn’t make a PR because that would be a breaking change and there is no clear support from its maintainers for it.

Also, if you are not much into manifold optimization, note that for example the idea of the so-called natural gradient is actually derived from (Riemannian) manifold optimization, and quite widely used in some applications.

Converting a problem from a manifold language to Euclidean-with-constraints language is definitely doable with a reasonable amount of effort in many cases.

Also, there is the case of mixing manifold and generic constraints, which is an active area of research.

I doubt that’s relevant. The key thing that makes manifold optimizers efficient is the existence of tools like retraction or vector transport that are fast to compute, which restricts applicability to a few families of spaces and their transformations. There are few enough things to implement that they are derived and implemented by hand instead of relying on AD.

1 Like

I also think that probably starting with @variable(model, x in Sphere(2)) sounds like a very good plan (compared to a @constraint this could even be considered as a domain?) and we could first check how far we gat and whether we can maybe even cover everything with that already.

The only thing a bit worrying for me (at least) is, that until now it seems x has to be a vector/matrix? That might for first be the only limiting thing in such an idea.

that until now it seems x has to be a vector/matrix?

Do you have an example of a manifold where the element is not a vector or matrix?

I plucked a few arbitrary manifolds, and they’re all defined in set notation.

@variable(model, x in Sphere(2))

What is the type of x? What operations can I perform on it?

1 Like

I don’t have much experience with the packages discussed here, and am definitely not an expert on optimization over manifolds. So I’m a little hesitant to stop just reading and jump in. But as I interpret it, you seem to be asking for an example for which the manifold is not an embedded submanifold of R^n or R^{m\times n}. Some time ago, I had a problem that I solved using optimization on Grassmann manifolds (\mbox{Gr}(p,n), the set of subspaces of dimension p in R^n). Matrices weren’t too far away, since I ultimately worked with matrices and retractions. But it’s a quotient manifold and not an embedded submanifold of R^{m\times n}. If you try to view a problem involving subspaces in terms of optimization over orthonormal bases for the subspace, you get the Stiefel manifold, which is an embedded submanifold of R^{m\times n}, but with a dimension that is different from that of \mbox{Gr}(p,n). They are distinct and I think the Grassmann manifold is a nicer approach when dealing with subspaces.

1 Like

I don’t mean that the element is in the reals per se. I mean, concretely, what is an example where the element of the manifold cannot be represented in code by a vector or matrix.

For your Grassmann example: Grassmann ¡ Manifolds.jl

Could I not write:

model = Model()
@variable(model, X[1:n, 1:k] in Grassmann(n, k))

It seems like the definition of Grassmann is any basis formed of k independent vectors in R^n. Don’t these vectors have to be made explicit at some point? If optimize over the manifold and obtain an optimal solution, won’t it be the set of k basis vectors X and their weights y in R^k so that ? x = X * y?

Some time ago, I had a problem that I solved using optimization on Grassmann manifolds

How did you do this? What was the formulation? What was the solution?

Oscar asked for a concrete example to build the abstraction back from. I don’t think a concrete example was given yet.

I guess to echo @mohamed82008, this discussion is very abstract, and my math knowledge doesn’t go as far as quotient manifolds and retractions :sweat_smile:

For example, these might be wildly incorrect (so humor me), but I’m looking for something like:

# Minimize projecton of the point `z` onto the Sphere manifold
z = [0.5, 0.5]
model = Model()
@variable(model, x[1:2] in Sphere(2))
@objective(model, Min, norm(x - z))
# Fit a 2-d hyperplane through a set of points in 3-d space
model = Model()
@variable(model, X[1:3, 1:2] in Grassmann(3, 2))
data = [rand(3) for _ in 1:10]
@objective(model, Min, norm(distance(X, y) for y in data))
2 Likes

In very many cases x is a metric or vector from a certain set – as in your three examples; but you are not allowed to add two points nor to scale one. Let’s again take the example of the 2-sphere (it’s just easier to visualise conceptually) as unit vectors in R3. Then \alpha x would not be on the sphere (unless \alpha = \pm1 and for two points x,y their sum is not of unit norm (besides a few very special cases).

Instead you have the (so-called) tangent space T_p\mathbb S – which for the 2-sphere is really. the plane tangent to a points, imagine these as “directions you can walk into”. And “walking” (in Euclidan space just x+v where v is any vector/direction to walk into) win the sphere means taking “the shortest path in that direction”, which is to follow great arcs. This is an example of the exponential map (which replaces +, but it uses two different objects – points and tangent vectors). And to already use a term from @mstewart – the exponential map might be expensive to evaluate, so one often uses approximations to these – the so-called retractions.
Locally both can be inverted (but usually not globally) and we get the logarithmic map (conceptually replacing minus) and inverse retractions (approximations to log).

Grassmann is indeed one of the conceptually a bit complicated but in practice still very nice manifold to work with numerically. But let’s start one step before. Stiefel(n,k) is the manifold consisting of all k-dimensional ONBs in \mathbb R^n These can be easily stored in a matrix (n x k) and we get the set of all matrices of that size which fulfils X^TX. Again, one can not add these matrices and stay on Stiefel, all arguments from before.

Now for Grassmann we do not care which basis we have, we are interested in the subspace. But We can still use X from Stiefel to represent the space. It is just that this is not unique. Any Y such that \operatorname{span}(x) = \operatorname{span}(Y) represents the same point on Grassmann – or in other words distance(M,X,Y)is 0 or isapprox(M, X, Y; atol=0) is true. But besides that this is a very concrete representation :slight_smile:

But in principle we still have nice matrices to represent the points (in a non-unique way), which works fine. So

this would work just fine :slight_smile:

But there is again a but. There is a second way (though not often used) to represent the space \operatorname{span}(X), namely by the projection matrix P \in \mathbb R^{n\times n}. So we use a type to distinguish them – where matrices are interpreted as being the Stiefel-representation – but for completeness we have a StiefelPoint and a ProjectorPoint type here. So to distinguish these, we would again need that x can be a struct.

Totally fine, I am very happy to explain as much as I can (and hopefully be a bit more knowledgeable about MOI/JuMP afterwards as well).

So you are looking for the closest point on the sphere to z ? This would be correctly written, but has a closed form solution (just normalise z)

Maybe not that easy to compute, since I would have to think a bit about the gradient of the objective, but besides that this looks fine. The only thing that would be complicated, if you either want to use StiefelPoint as the type of x (but luckily that is equivalent) – but worse if you want to represent X by projection operations. Then you would need a struct in the @variable.

1 Like

@blegat seems to be making good progress with https://github.com/JuliaManifolds/Manopt.jl/pull/264.

He’s implementing exactly what I had in mind:

Yes, I am helping here and there – and as I said, we should get reasonably far with @variable, with the careful caveat that with just-arrays we probably only cover a not-so-large part of Manifolds.jl.

It sounds like I reinterpreted your question to make it too abstract. You certainly can choose a matrix with columns spanning an element of Grassmann(n,k) and computationally, you need a concrete representation. After that it’s a matter of how you handle search directions and line search, which in a conceptual sense will likely involve retractions. Computationally everything is done with matrices, but there are definitely some aspects related to representing the subspaces (in terms of points on the Stiefel manifold or in terms of projectors as @kellertuer noted) and computing and representing search directions in the tangent space (e.g. gradients) that are different from just minimizing a function over R^{n\times k}.

Just picking a basis for the subspace is not a problem. My own efforts in this area were prior to starting to use Julia and I still haven’t done any optimization in Julia. So I don’t have any idea if there would be any more subtle pitfalls associated with setting up problems to solve with those types of algorithms in JuMP.

There aren’t really any weights. A matrix X with columns spanning the optimal subspace is what you would want as solution. In my case, the full problem formulation was somewhat messier and probably not that relevant. I was looking at a problem formulated in terms of pairs of subspaces, so really it was a product of Grassmann manifolds and I ended up with a pair of matrices (X,Y) with each matrix providing a basis for a subspace. I stuck to the framework described in Optimization Algorithms on Matrix Manifolds. The abstract formulation of the algorithm was Algorithm 1 from chapter 4. Steepest descent for Rayleigh quotients on Grassmann(n,1) is also given as an example in chapter 4. That’s probably the simplest possible example of a problem formulated in this way, but you would admittedly have to wade through a lot of terminology and notation. My own problem formulation was similar, but messier. And I eventually settled on an approximate Gauss-Newton direction instead of steepest descent.

I was trying to answer the specific questions, but I suspect some of that is not very directly helpful. I think that @kellertuer is giving nicer explanations of the general issues than I can give. I learned what I learned specifically to solve that problem. It’s not an area in which I have especially broad experience. I did come away impressed with how nicely standard techniques for unconstrained optimization transfer over. That Algorithm 1 is general for any Riemannian manifold using the exponential map as the retraction if you don’t have something easier. And there is general convergence theory that transfers over as well. It’s really nice how well it works and how efficient the algorithms can be.

1 Like