Optimization on a manifold

jlperla · May 3, 2021, 1:52pm

I am 100% with you. Replication is essential.

But the other thing I have found is that by using more robust optimizers it is often easier to get a sense of the domains of attraction from initial conditions (and, frequently, problems in my models which need to be fixed or corners that I didn’t anticipate would ever bind).

In particular, you can tell these optimizers to emphasize feasibility over optimizality and then it can carefully walk along the F(a,p) = 0 manifold which might be essential for the stability of your other code. Another thing you can do is fix an initial a and tell it to find a feasible point without worrying about the objective. Once you know “the solution” you can backwards engineer conditions to guide the less robust optimizers to get to it and drop the commercial one.

But with all that, I completely understand your aversion to experimenting with commercial optimizers. And I suspect that black box derivative-free ones are never going to work with a problem of this type because feasibility on the constraint is too important.

mohamed82008 · May 3, 2021, 2:49pm

Try setting some options in Ipopt, e.g. start_with_resto = true or max_soc = 50. You can also use second order Ipopt or AugLag by using the option first_order = false. The default in Nonconvex is first order.

stevengj · May 3, 2021, 3:31pm

Note that you can differentiate M(a,p(a)) with respect to a very efficiently and easily using an adjoint method, a.k.a. implicit functions — given the solution of F(a,p) = 0 and the Jacobian \partial F/\partial p, you only have to solve one linear system with \partial M / \partial p as a right-hand side. See e.g. section 3 of my course notes.

You could, alternatively, just use an optimization algorithm like SLSQP that supports nonlinear equality constraints directly, and then just solve \min_{a,p} M(a,p) subject to F(a,p)=0.

(For example, a better version of your “penalty” method is the augmented Lagrangian method, which adds both a penalty and a Lagrangian term and is designed so that the penalty does not need to become extremely large in order to enforce the constraint to a small tolerance — unlike your simple L2 penalty.)

Tamas_Papp · May 3, 2021, 3:44pm

Thanks. In fact I tried using a constraint (via SLSQP and the augmented Lagrangian), see here:

but I keep getting FORCED_STOP from NLopt and I am not quite sure how to debug that.

stevengj · May 3, 2021, 3:45pm

Is your code throwing an exception?

Tamas_Papp · May 3, 2021, 3:54pm

I don’t think so. My reading of the manual suggests that NLopt would rethrow those and I would notice, is this correct?

Tamas_Papp · June 21, 2021, 6:17am

A follow-up if someone is interested: there were two problems with the original problem: a conceptual and a numerical one.

The conceptual problem was that the Stone-Geary preferences we used for modeling non-homotheticity shuts down the service sector below some level of GDP — a lot of moments become 0, derivatives have a very different sparsity pattern, and in practice seems to lead most methods astray when they are at a point that the region they are investigating is on or near a boundary.

This was solved by using methods from a nice recent paper by Comin, D., Lashkari, D., & Mestieri, Martí (2021), coded up for AD in

Now everything is continuously differentiable.

The numerical problem was occasional floating point overflow for parameters which are nowhere near the optimum, but are nevertheless investigated by the algorithms. It is very easy to get an exp(x) == Inf for a not-so-large x, and when two of these are subtracted etc you get NaNs. What was insidious about this is that this occurred mostly in the Hessian, which we did not test for originally. While I recognize that coding up a reasonable objective is the responsibility of the user, it would be better if optimizers checked for all(isfinite, ...) for everything they use and fail early. I opened an issue for JuliaSmoothOptimizers which ended up as JuliaSmoothOptimizers/SolverTest.jl/issues/9.

Some benchmarks:

Percival.jl (either via Nonconvex.jl, or used directly) finds a global optimum 54% of the time, from randomly drawn points,
Ipopt.jl gets it right 64% of the time, from the same points,
still can’t get NLopt.jl to work (same failure as noted above).

Again, thanks everyone for the help.

cdawg · June 21, 2021, 8:35am

Out of curiosity, what’s wrong with gradient descent or quasi-newton on the augmented problem?

L(a,p,λ) = M + Σ_i λ_i F_i ?

(I don’t know anything about this field)

rafael.guerra · June 21, 2021, 11:00am

Dear Steve, if possible and available, could you please share the Julia-equivalent of the Matlab code snippet displayed in your notes on adjoint methods? Would like to save this with the course notes. Thanks.

Tamas_Papp · June 21, 2021, 12:16pm

Generally you do not know the Lagrange multiplier.

Chapter 17 of Nocedal & Wright is a nice introduction motivating the Augmented Lagrangian method.

cdawg · June 21, 2021, 1:06pm

Generally you do not know the Lagrange multiplier

Right, thanks for the citation. I looked through the chapter,

It makes perfect sense to me to simultaneously optimize (a,p,λ) for each μ, and anneal a sharp constraint…
L(a,p,λ | μ) = M + Σ_i λ_i F_i + μ/2 F^2
My question might be dumber than that. Why doesn’t μ = 0, as an objective function for (a,p,λ)
L(a,p,λ) = M + Σ_i λ_i F_i
doesn’t just work. Is it because 2nd (partial-)derivatives vanish? Are these KKT conditions not usually satisfied or something?

(The quadratic constraint looks ugly to me because it isn’t invariant to changes in F - scaling.)

Tamas_Papp · June 21, 2021, 1:16pm

Sorry, I am not sure I understand what you are suggesting above. Depending on the sign of F_i, blindly optimizing the objective above would get you \lambda_i = \pm \infty.

mohamed82008 · June 21, 2021, 1:17pm

You can show that for mu large enough, the augmented Lagrangian becomes a strongly convex function even if the original function and constraints are non-convex. This significantly improves the convergence behavior when minimizing it.

cdawg · June 21, 2021, 2:03pm

Right I see for μ = 0, the optimum is stationary but L is like completely hyperbolic there. Apologies for asking for a geometry lesson on Julia discourse. Thanks for your patience!

Topic		Replies	Views
ANN: Nonconvex.jl a toolbox for AD-based constrained non-convex optimization Package Announcements package , optimization	34	3498	September 5, 2021
Finding saddle point General Usage optim	36	3731	May 29, 2020
Survey of Non-Linear Optimization Modeling Layers in Julia Optimization (Mathematical) nonlinear	63	4432	February 4, 2024
JuAFEM + ForwardDiff + NLsolve = Finite elements for nonlinear problems? Modelling & Simulations	53	4127	April 4, 2018
Optim and differential equations? Specific Domains	45	3468	July 5, 2019

Optimization on a manifold

Related topics