where \cdot is the dot product, J(\mathbf{x}) the objective function, and the nonlinear equality constraints are defined as \mathbf{f}(\mathbf{x}) = \mathbf{0}.
My understanding is that MOI will do \nabla^2 J(\mathbf{x}) + \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) internally before passing it to the solver (if the Hessian of the objective function is also provided), am I right ?
The MOI.VectorNonlinearOracle also requires functions to evaluate the equality constraint \mathbf{f}(\mathbf{x}) and its Jacobian \nabla \mathbf{f}(\mathbf{x}). When computing the Jacobian with e.g. DifferentiationInterface.jl and ForwardDiff.jl, we can compute both efficiently with value_and_jacobian. Now, If I compute the \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) matrix with value_gradient_and_hessian, can I re-use the intermediate results (the value and the gradient), to construct the \mathbf{f}(\mathbf{x}) vector and \nabla \mathbf{f}(\mathbf{x}) Jacobian ? My feeling right now is no, so I should just compute \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) with hessian directly, since the intermediate results are useless here.
The Jacobian can be dense or sparse, depending if I use a single shooting of multiple shooting transcription. The Hessian will be always sparse.
But my question is not about sparsity, but on avoiding redundant computations by using value_and_jacobian and value_gradient_and_hessian judiciously.
In other words, the VectorNonlinearOracle want 3 functions that computes respectively:
The value of \mathbf{f}(\mathbf{x})
The Jacobian of \mathbf{f}(\mathbf{x})
The Hessian of the function \ell(\mathbf{x}, \boldsymbol{\mu}) = \boldsymbol{\mu} \cdot \mathbf{f}(\mathbf{x})
We can compute 1. and 2. efficiently in one single value_and_jacobian call. Now are the value and the gradient of \ell(\mathbf{x}, \boldsymbol{\mu}) completely useless, or they can be used to compute 1. and 2. ? My feeling right now is they are useless since the dot product make it impossible to recuperate the \mathbf{f}(\mathbf{x}) value.
edit : I think @odow should be able to answer my question easily.
To answer my own question here, clearly it does. I had difficulties finding good references on this, but the MATLAB documentation is great on this subject. The Lagrangian of the equality constrained optimization problem above is:
meaning that the Hessian of the Langrangian is really:
\nabla^2 L = \nabla^2 J(\mathbf{x}) + \sum_i \mu_i \nabla^2 f_i(\mathbf{x})
I think it would be a good Idea to add this little piece of information in VectorNoninearOracle docstring or somewhere else. Maybe just a sentence e.g. “The Hessian of Langrangian for a given model is the sum of all VectorNonLinearOracle Hessians and all nonlinear operator Hessians.”
Ideally, you also want a multiplier for the objective, which the optimization solver may need to steer (for example in \ell_1 relaxation or feasibility restoration).
Yeah, the idea is that you can sometimes reuse the results of a second-order derivative because it computes first-order quantities as a byproduct. Here however all constraints are mixed, which makes this impossible. You could do something if you were to compute the Hessian of each f_i individually, but that would be wasteful.
About filling ret in the Hessian of the Lagrangian function, should I fill only the upper or lower triangular part? Or it will also work if I fill both part ?
ret is an AbstractVector{Float64}. You must supply the sparsity structure using the hessian_lagrangian_structure kwarg. Then ret are the corresponding non-zeros. Since the matrix is symmetric, you do not need to provide triangles. Matching elements (i, j) and (j, i will be summed together.
I can make a PR to improve the docstring and make this clearer.