Optimization on a manifold

This is both a conceptual question, and a practical one of what packages I could use to solve this.

Problem statement

I am trying to solve the following problem. Let a \in \mathbb{R}^M be a vector of parameters, and p \in \mathbb{R}^N prices. Market clearing F: \mathbb{R}^M \times \mathbb{R}^N \to \mathbb{R}^N requires that

F(a, p) = 0

where F is differentiable in both arguments (both in theory and practice). Take a leap of faith and assume that for each a, there is a unique p(a). I am minimizing some function M of a, p, and data d, ie

\min_a M(a, p(a), d)

Currently, I have explored two options.

Inner loop rootfinding

For each a, find p(a) st F(a, p) = 0, eg via NLsolve.jl, then plug the objective M(a, p(a), d) into a solver like NLopt.jl or Optim.jl. This works, but is somewhat expensive and differentiation is tricky (but doable).

Penalty

Optimize M(a, p, d) + \lambda \| F(a, p) \|^2_2 or similar in (a, p). The idea is that the market-clearing p will also be optimal, so F(a, p) = 0 anyway. But in practice, this does not always work, I get stuck in local optima with F \ne 0.

Penalty + renormalization

Run the optimizer above for a bit, then if F(a, p) gets “large”, reach for the rootfinder.

Suggestions are welcome. I can make an MWE but the actual problem is about 20k LOC so it is not possible to consense all the quirks (in particular, nonlinearity and occasional ill-conditioning) into a simple example.

In particular, I am wondering if there is a systematic way of following the implicit p via homotopy.

2 Likes

Looks like a problem for Ipopt with equality constraints. Throw it in Nonconvex and see if it works.

2 Likes

I’ve worked with this kind of problems in the context of Multidisciplinary Design Optimization. Experts use two categories of methods:

  • the rootfinding approach you describe: you solve F(a, p) = 0 for a given a. Then you compute the (analytical) coupled derivatives \frac{dp}{da} by using the implicit function theorem (it yields \frac{dp}{da} = -[\frac{\partial F}{\partial p}]^{-1} \frac{\partial F}{\partial a}) and plug it in the gradients needed by the solver: \frac{dM}{da} = \frac{\partial M}{\partial a} + \frac{\partial M}{\partial p} \frac{dp}{da} ;
  • you optimize on a AND p and use F(a, p) = 0 as an equality constraint. The problem is larger, but you also have more degrees of freedom because you decouple the problem.
3 Likes

Could you provide the mathematical form of the functions F and M? You may be able to solve this with JuMP, probably using Ipopt solver. In JuMP you can define objective function to minimise M (with variables a and p) and constraint(s) that sets F equal to zero, but it is difficult to suggest anything with the information provided in the question.

1 Like

I’d second @cvanaret’s second point. As you probably know, this is called the MPEC strategy in economics, following Su and Judd ECMA 2012. In practice I’ve had much more success with KNITRO than with Ipopt.

2 Likes

In MDO, the first approach is often called nested analysis and design (NAND) while the second is simultaneous analysis and design (SAND). Note that you can adapt the tolerance in the root finder based on the KKT residual of the optimiser to improve the performance of NAND but this isn’t trivial to implement with C solvers. SAND can in theory be slower because of the larger number of variables but it does avoid the problem of choosing 2 termination criteria for the analysis and design so it may be faster in practice depending on your problem. In general, any structure, e.g. sparsity, in your problem should be exploitable in both cases but it depends on the solvers you use.

MPEC would rather be the case where you have nested optimization problems (or bilevel programming) and you write the inner problem’s optimality conditions as constraints in the outer problem.

Conceptually speaking:

What is the issue with differentiation? You don’t differentiate through the solver, you just use implicit differentiation: \partial_aF + \partial_p F\partial_ap = 0. This is different from differentiating through the solver, because your nonlinear solver uses a fixed point iteration, and you only use the AD framework on the last step when you’re close to the fixed point.

What is the problem with expensive? You should have very good initial guesses for p=p(a) during optimization, simply by using a previous already computed \hat a, i.e. p(a)\approx p(\hat a) (and if you differentiated, p(a) \approx p(\hat a) + \partial_a p(\hat a) \cdot (a - \hat a)). With that, the nonlinear solver should hopefully often only do 1-2 newton step per optimizer step. (in a certain sense, this is “solving p via homotopy”, i.e. use small changes in a that keep you in the good newton regime)

In case F misbehaves (badly conditioned \partial_p F), you can just choose a different splitting of your \mathbb{R}^{M+N}.

2 Likes

Thanks everyone for the answers. I coded an MWE, too large to paste here, at

The solution attempts are in scripts/. Currently,

  • Nonconvex.jl fails with Percival (negative minimium, should be impossible) and Ipopt (segfault).
  • NLopt.jl solvers SLSQP and AUGLAG fail with FORCED_STOP.

Help is appreciated, maybe I am making an obvious error.

2 Likes

This should be feasible, but I don’t know how to interface this with available optimization packages. Maybe I could define a custom object and use OptimKit.jl to perform the inner rootfinding step.

The function is huge, 10k lines of custom code, solving and simulating an economic model. I am not sure JuMP is a good match for this, they warn people away from using it for black box functions.

I ran the Nonconvex code with Percival and got:

julia> sol.minimum
0.0

julia> sol.minimizer
10-element Vector{Float64}:
 0.359334233588906
 0.3196139200199018
 0.5590339287643089
 0.7590237566438538
 0.9632381979885252
 4.349953166640202
 5.425757417828363
 6.080448515906158
 0.09678771581978313
 0.09678814339585734

no negative values.

Also Ipopt doesn’t segfault on my machine but it errors because the gradient of the objective has Inf or NaN sometimes. I should probably handle this case better.

Thanks for checking this. That version actually started from the optimum, I now modified it, Percival fails (note that this does not reflect badly on Nonconvex.jl and/or Percival.jl; the problem is nasty).

You are right about Inf — that’s what the problem returns for infeasible points (ie for which there is no objective defined). These are not possible to formulate as a constraint in the usual sense, since this is only learned during equilibrium computation, so I left it in the MWE.

Some packages deal OK with Inf for infeasible points, it is unclear what Nonconvex.jl does.

Generally, gradient-based constrained optimization packages require the function and its gradient be defined and finite unless the optimizer is setup to handle this case, e.g. by backtracking. I don’t know if Ipopt can handle non-finite objectives but this is not what’s happening here. The objective is finite and the gradient is not. Try printing the values.

This is the problematic point:

julia> infx
10-element Vector{Float64}:
   0.5999999930791519
   0.3000000051503569
 119.88975646123951
   0.0
   1.0999999978216441
  11.110513653958316
  12.488658552547799
  13.281830014090165
   5.718834275004883
   5.718834275002149

julia> objective(infx)
distance = 557150.9425416458
557150.9425416458

julia> Zygote.gradient(objective, infx)
distance = 557150.9425416458
distance = Dual{ForwardDiff.Tag{typeof(objective), Float64}}(557150.9425416458,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN)
([NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN],)

I have not found Ipopt to be very useful in these sorts of big and ugly functions.

Give a commercial optimizer such as Knitro a shot. Even when they are using similar algorithms to the open source optimizers, they add in sorts of heuristics and tuning tricks and options, and have convenient multistart. And if you find that you are best off writing your constraints as complementarity constraints (as often pops out FOCS) it has specialized algorithms.

Try it out with a demo version and just let is use finite differences. If it finds the right solutions (with or without multistart) then you can see about getting smarter Jacobians and gradients.

4 Likes

I have not tried this, but know the algorithm and the C++ version is good stuff. So, you might look at

This should be able to handle failure of the objective function to return a value. By the way, it’s better to return a NaN than an Inf in the event the optimizer tries to use the return value. It will not exploit any smoothness you have as well as a traditional gradient-based method, but that’s part of the price you pay for this kiind of algorithm,

2 Likes

@mohamed82008, thanks again for all the help. One of the bounds is actually strict > 0 one, and reaching that will indeed give a NaN. I bounded it away from zero, and actually got Nonconvex.AugLag() working! I am pretty excited about this as it seems to be rather fast.

The repo now has a list of 1000 random feasible staring points in a box for comparable tests, if anyone wants to experiment with their favorite algorithm. AugLag finds the optimum from around 5% of them, which is pretty promising. This was achieved by taking logs of the wages, which were somewhat nonlinear, and scaling the problem along that dimension by about 1000, otherwise it tries to evaluate outside bounds.

Still could not get Ipopt (via Nonconvex) working, it fails with EXIT: Restoration Failed!.

Despite my best efforts, both NLopt algorithms fail with FORCED_STOP. If someone is familiar with NLopt internals, help would be appreciated.

I am considering black box / commercial packages as a last resort only; for several reasons. From a philosophical perspective, it practically makes replication difficult. I have coded TikTak for multistart in MultistartOptimization.jl and find it nice. Also, having a full Julia stack is practically advantageous for debugging.

Thanks, but now I have reworked the the code to be ADable, at least via ForwardDiff, and find it such an improvement that I would be reluctant to go back to a derivative-free algorithm.

2 Likes

Yes! Derivative-free methods only make sense when you can’t get gradients for some reason. Avoid them if you can.