After solving some structural estimation problems in economics with existing tools, I have a more clear idea of the kind of algorithm I am looking for. AFAICT no existing Julia package exists that does everything below — which is not a problem, I am willing to implement an algorithm described in a paper. I am starting this topic to get suggestions.
Specifically, the optimization problem I usually solve can be described as
The constraint and the objective are ideally computed in a single pass, eg [f, c...] = F(x), where the user implements F. F usually involves a lot of computation and is expensive relative to everything else. (Example: heterogeneous agent OG model where c is market clearing and f is sum of squares deviation from observed moments in the data.) The user codes up F.
The Jacobian of c may not be full rank (a lot of methods require that it is). Eg when c is a market clearing condition, it will definitely not be full rank (Walras’s Law), one cannot always reduce it in a nice manner.
Jacobian of F (ie gradient of f and Jacobian of c) can be obtained, usually via AD, but higher-order derivatives are prohibitively expensive to get at every iteration.
F is continuously differentiable.
I have used Percival.jl and NLopt.jl for these kind of problems with some success, where I run into problems is the multi-pass invocation of F to get f and c. I have defined a shim package ObjConsNLPModels.jl to mitigate this in Percival, for NLopt.jl each constraint is a separate invocation of F so it is not practical.
I think I should implement my own version of some algorithm… but I am not sure where to start, there are so many. I have been looking at SQP algorithms but I don’t know much about those, a reference to a paper that allows a clean (if simple) implementation would be nice.
What I do not need:
No bounds on x necessary, the user can map from \mathbb{R}^n to the relevant domain.
The algorithm does not have to be “matrix-free”, as F is so expensive it dominates everything (on the order of seconds). Doing an SVD on a 1000x1000 matrix is considered trivial in comparison.
(Point 1 is bit obscure, but NLopt always evaluates the objective function first and then the constraints, so you can compute the constraints along with the objective function, cache them somewhere temporarily, and then return the cached value when the constraint function is called.)
I find that I very often want this as well, for certain optimization problems over simulations of a dynamical system, both f and c are typically functions of the state trajectory, which is the expensive thing to compute.
Custom solvers are often written to handle, e.g., trajectory-optimization problems, partly to allow exploitation of this kind of problem structure.
You can re-write your constraints as \mathbf{F}(\mathbf{x}) = [f(\mathbf{x}) \, \mathbf{c}(\mathbf{x})']' = [m \, \mathbf{0}']' and maximize m as your new objective wrt both m and \mathbf{x}. Since your objective and constraints are all non-linear and you only need first order algorithms, it shouldn’t really make a difference which formulation to use afaik.
You introduce an additional variable m, and replace the original constraint c(x) = 0 with the new constraint [f(x); c(x)] = [m; \textbf{0}], i.e., you have added the constraint m = f(x). The objective function to maximize is now given by m only.
The benefit would be that the objective function would now be trivial to compute, it’s just the variable m, and all the computationally expensive parts are contained in the composite constraint function that computes [f(x); c(x)]
You are probably right that for the general case (c(x) might be complicated or mean) we do not yet have a package. But if (also only parts of your equality constraints) form a Riemannian manifold (Maybe like x is a matrix with orthonormal columns, then you are on the Stiefel manifold), then Manopt.jl might be something to check out.
It also covers the case of equality constraint optimization on the manifold, so if only “certain parts” of your c form a nice manifold, you could to constrained optimization on Manifolds (e.g. Exact Penalty method) still.
Both would however mean that your F can be split a bit maybe.