Survey of Non-Linear Optimization Modeling Layers in Julia

Thanks for mentioning, no I have not yet compared it to other implementations, mainly because the goal in Manopt.jl is a little different. It defines the Riemannian Augmented Lagrangian – that is – aiming for constraint optimisation problems defined on arbitrary Riemannian manfolds. Of course, if some of the variants also work in that setting, it would be great to have them as well :slight_smile:

Doesn’t it boil down to projecting or re-parameterising the augmented Lagrangian subproblem to handle the manifold constraints? The primal step would just need to respect the manifold constraints iiuc. The core of the algorithm should be comparable to Percival when the manifold used is the Euclidean space. Please correct me if I missed something.

1 Like

No it does not just boil down to projections.
But yes, if we set M=Euclidean(n) that is we are on R^n it should be similar.

The nice thing here is that one obtains a tradeoff:
You can think of a problem f(x), x on M (e.g. the sphere), as turning a constraint problem (in the embedding) into an unconstraint one. The difference is, that we can not do + anymore but have to use a retraction or the exponential map (again on the sphere – “follow great arcs”).

This is also the main difference in the Augmented Lagrangian method; what you call the primal step does not just do a + (since that does not exist on manifolds) but uses a retraction.
Even worse: There does not exist a dual space of a manifold.
So while the general algorithm might be similar (and in that sense “simplifies” to the Euclidean one if M is R*n) the single steps are more involved due to the differential geometry tools involved.

Finally – in the current version the Lagrange parameters are indeed real numbers (or vectors depending on how you write them) but even that might be something that might be generalised.

edit: oh the easiest example where one can see that projection does not work: Symmetric positive definite matrices (a beautiful manifold) is an open set in the embedding (the set of n x n matrices) so projection onto the manifold is not possible.

1 Like

Thanks for the useful reply. I need to read more about manifold optimisation. Can’t one project on the PSD domain by projecting to the symmetric matrices domain first (or just optimising the lower/upper triangle) followed by a projection of the eigenvalues? Alternatively a triangular re-parameterisation A = L * L' can be used instead with positive diagonal on L e.g. Stan Reference Manual. Again I am probably confusing manifold optimisation with generic re-parameterisation and projection tricks to get rid of constraints so forgive my ignorance.

The problem is that the Eigenvalues are all positive (not just nonnegative, so 0 is not included!) so that is what you can not project onto.
Optimization on manifolds avoids this intrinsically, in short: Instead of “walling along straight lines” the approach follows “nice” curves that (automatically / intrinsically) stay on the manifold. But even then, there is different metrics on every manifold and it might be, that what you are thinking of is close to Cholesky Symmetric positive definite · Manifolds.jl

Oh, don’t worry, there is surely also other nice tricks besides optimisation on manifolds, but reparametrisation and projection is something different, that does maybe work for a few manifolds but not for all of them. Also several manifolds yield a huge reduction in dimension (compared to a constraint optimisation in some embedding) see for example the Fixed-rank matrices · Manifolds.jl (instead of nm this just requires nk + k + mk variables).
Ah but we are getting of topic concerning this topic, so maybe just to finish, see Nicolas nice book Nicolas Boumal, applied mathematics that even starts with these embedded manifolds before considering the more general case.

1 Like

Thanks for the reference and your explanation. I will go through it.

1 Like

Yes, the structural_simplify pass of ModelingToolkit now will do things like tear the nonlinear systems of constraints, do alias elimination, and substitute out variables it can explicitly solve for. That can greatly reduce the size of the system that needs to be solve if it’s “constraint dominated”. We’ll try to get some demos of this out by the end of the month.

The concern about the size of the derivative code has not been addressed though.

2 Likes

Hi Carleton, could we maybe update the table, since Manopt.jl does cover NL constraints – we for example have ALM (Augmented Lagrangian Method · Manopt.jl) and EPM (Exact Penalty Method · Manopt.jl), though with their sub solvers they might be a bit more involved than usual, since both are able to solve constraint problems on Riemannian manifolds.

@kellertuer, I would be happy to but discourse locks the posts after some time. So I cannot modify it at this now. Maybe we can ask a site admin to give us a hand?

If you are interested in trying out the OPF problem, I would love to see a Manopt implementation added to the rosetta-opf repo.

Thanks :slight_smile:

Concerning the OPF example, my sheer simple problem is, that that looks like so much to implement, that I did not yet find the time (while having a semester with a whole own seminar on the side especially).

Especially since it is not my area, just the number of variables is frightening, that I fear if I just try to “implement it straightforward” – I’ll spend weeks debugging all those. So that kept me from trying it.
I also fear it will not be that performant, since Manopt.jl is (a) a generic solver but (b) especially focuses on manifold and non-smooth objectives thereon (but sure M=R^n yields the classical case of NL-optim).

But I can take a look somewhen, when I feel a bit less frightened by the gazillion of variables.

edit: or to be more precise – I am used to just having f(x) (and for the constraints g(x) <= 0 and h(x) <= 0) but x just being in some manifold (or in your case some R^n). But it feels like I need 400+ lines of code to assemble that in your OPF? Will zero experience in OPF, I feel like that might be a very long hobby project for me.

At the same time I expect (since the focus of the package is different) to be not really fast, probably last place by far, so that also is not so motivating.

If you can prepare for me a MWE along the lines of these examples below, then I can probably generalize it to the whole OPF problem. Even if Manopt is not super scalable, it is interesting to see what solutions it converges to for whatever size it can solve. The smallest cases have around 30 variables and constraints, which seems not so bad to me. Iterations to convergence could also be an interesting thing to track at any scale that is solveable.

Well Rosenbrock I can do :wink: Rosenbrock · ManoptExamples.jl. Rosenbrock even gets a bit better when one switches the metric on R^n (but that example is for now only reported in this paper, Sec 7.2 and not yet in the ManoptExamples

Size should not be a problem, I would think, something of size 5*10^5 variables should still be reasonable, even on manifolds. The problem with your first example is really that the setup of the problem takes 400 lines.
Nonconvex solvers are for now not covered much besides Difference-of-Convex (where we have 2 solvers)

And the main difference to the ones that you write are probably

  1. our functions have the manifold upfront, so you would take M=Euclidean(n) and f(M,p) = ... to define the cost at point p
  2. Constraints, Gradients,… have the same signature, where the gradient can also be given as a mutating function, so like grad_f!(M, X, p)computes the gradient of f at p in-place of X.

I can adopt the constraint Rosenbrock the next days, no problem.

I think I need more details than what is in these docs. First, I need a way to setup the gradient oracle using some automatic system, so that I don’t need to build it explicitly by hand. Second, I need to know how to define the systems of constraints for the problem (including equality and inequality constraints) of nonlinear functions and linear bounds on the variables.

If you look closely at these MWEs you will see that they have all of these features. That is one of the reasons I had to make them, they are variants of the standard examples, which can be generalized to the OPF use case.

…and that is the part that would take me weeks :confused:

I have no clue what you mean by gradient oracle for example. I only know the setup where for the cost function f you provide the gradient function \nabla f. But since all that is Euclidean, you could use AD for the gradient function. And besides that you can use AD tools as you like both the gradient and the cost are really just simple plain functions Manopt.jl.

Equality and inequality functions are either g: \mathbb R^n \to \mathbb R^m or single functions g_i, i=1,...,m (all just \leq 0) in a vector, but I never distinguish between linear or nonlinear functions; (a) because most of my functions are nonlinear anyways (b) in manifolds there is no such thing as linearity. Completely the same for gradients of these.

Bounds do not exist, you will have to write them es inequality constraints. I can check the examples again, but I fear again it will just be too much work (none that I can do in a reasonable time) and we would still only learn that a package tailored for problems on manifolds is problably anyways not suitable in terms of speed for you.

Overall I feel my main reason to never use JuMP for example is that I feel that all the setup I would need to learn in there is just much too much overhead for the problems I solve (I usually know my function f(x) no need to declare @variables, I just have x), which probably is super useful in your setup.
Since I do not have such fancy setup (the Rosenbrock example is already the level of complexity in its fullest that I have for setting things up), might even mean we are not speaking about nice 400 lines of code but probably much much more?
And then the questions is, would it be worth twiddling all the details you need for the next 4-6 weeks every evening or so? I am not sure.

edit: maybe the two worlds are really just too different. Constraints in Manopt.jl are more like - if you need them, here you are. But the very most problems are in constraint on manifolds (or in other words: the constraints form a nice one like the Stiefel manifold).

edit2: I at least tried for an hour, see Maybe Manopt? by kellertuer · Pull Request #32 · lanl-ansi/rosetta-opf · GitHub, but as my last edit indicates, I feel we not only speak two very different “optimization languages”, that I am really unsure its worth continuing. But let’s do that on the concrete MWE in said PR maybe.

2 Likes

I would be happy to but discourse locks the posts after some time. So I cannot modify it at this now. Maybe we can ask a site admin to give us a hand?

I’ve edited the table. Not sure if you can now too. I didn’t see an option to unlock it. Do we still want a tick for AD in ManOpt? My understanding was that the column was for packages that handle it behind the scenes so that the user doesn’t need to define the gradient callbacks.

Thanks @odow! Based on our most recent exploration I think the is correct to add the check to NL constraints, but the AD one should be changed to “X” as this was meant to reflect a case where the modeling layer provides this “behinds the scenes” without the user needed to think about it. Any objections @kellertuer?

Hm,
The one thing where I do object is, that on manifolds AD is far more advanced, since it involves Euclidean AD plus several conversion tricks. These are automatically available, see the grad_f2_AD at Use Automatic Differentiation · Manopt.jl, which does quite a bit of work behind the scenes. So if you could state what is missing? To me the criteria here are a bit unclear. The post does state “some kind of AD system” – Manopt definetely has that, see the link above – but after that it states “derivative oracles” where I again do not know what that means (since we “speak” two different optimization languages it seems).

In the end that is of course your table; you could also argue that Manopt does not finish the benchmark with the NL constraints, so does that count or do you want to remove that as well? You could even argue that Manopt is not so much tailored to the “Modelling Layers” mentioned in the thread and remove it completely.
I think it is not my position to object to anything here, your thread, your criteria.

edit: And don‘t get me wrong, I am not angry at any of your decisions – I feel quite humbled to be on that list at all, since Manopt.jl is not designed for the tasks that are under consideration here.

Thanks for the explanation @kellertuer. I did not notice this part of Manopt’s functionality for the manifolds.

In the original post, what I was trying to get at with the AD column was that an AD system is provided for user defined functions. So the Manopt user API given user provided definitions of objective, g, h would be like this,

augmented_Lagrangian_method(M, objective, g, h)

and not like this,

augmented_Lagrangian_method(M, objective, objective_grad, g, g_grad, h, h_grad)

As you can see from the OPF example we were working on, developing gradient functions by hand is quite difficult once the problem reaches some level of complexity, which is why this feature is needed for the kind of applications I encounter.

Based on this discussion, I suggest we put (partial) under the AD column for Manopt, as AD supports the package provided manifolds but not yet for user defined functions. How does that sound @kellertuer?

That sounds very fair to phrase that as (partial), since the part we do have is not the main part required here (but more the one on Manifolds).

I agree, that in any case where a gradient is required it could be set to something that is automatically computed. I have to think a bit how to best do that (in a generic sense for all solvers).l Will open an issue of that to keep that in mind.

I totally understand, that for complicated costs / constraints, computing the gradient by hand (i.e. implementing that yourself) might be a bit challenging.
So I completely understand the request for such a feature.
For constraints I do not expect that to happen anytime soon in Manopt.jl, but for the cost that might actually be the case. And if we add such a feature it is just fair to do it for the constraints directly as well :slight_smile:
By the way similarly if someone has A Euclidean Gradient and wants to have the Riemannian one.

edit: If you want to keep it short maybe just :yellow_circle: ? (And what is by-the-way the difference between the red X and NA?)

Updated the table.

NA solvers don’t use derivatives.