Computing Hessian of the Langrangian efficiently with DI.jl

Hello,

The new MOI.VectorNonlinearOracle support providing the Hessian of the Langrangian function, defined as, according to MOI.jl doc:

\sum_i \mu_i \nabla^2 f_i(\mathbf{x})

but the Langrangian typically also include the objective function:

L( \mathbf{x},\boldsymbol{\mu}) = J(\mathbf{x}) + \boldsymbol{\mu} \cdot \mathbf{f}(\mathbf{x})

where \cdot is the dot product, J(\mathbf{x}) the objective function, and the nonlinear equality constraints are defined as \mathbf{f}(\mathbf{x}) = \mathbf{0}.

  1. My understanding is that MOI will do \nabla^2 J(\mathbf{x}) + \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) internally before passing it to the solver (if the Hessian of the objective function is also provided), am I right ?
  2. The MOI.VectorNonlinearOracle also requires functions to evaluate the equality constraint \mathbf{f}(\mathbf{x}) and its Jacobian \nabla \mathbf{f}(\mathbf{x}). When computing the Jacobian with e.g. DifferentiationInterface.jl and ForwardDiff.jl, we can compute both efficiently with value_and_jacobian. Now, If I compute the \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) matrix with value_gradient_and_hessian, can I re-use the intermediate results (the value and the gradient), to construct the \mathbf{f}(\mathbf{x}) vector and \nabla \mathbf{f}(\mathbf{x}) Jacobian ? My feeling right now is no, so I should just compute \sum_i \mu_i \nabla^2 f_i(\mathbf{x}) with hessian directly, since the intermediate results are useless here.

Thanks!

There’s nothing built into DI at the moment.
Are your Jacobians and Hessians sparse or dense?

The Jacobian can be dense or sparse, depending if I use a single shooting of multiple shooting transcription. The Hessian will be always sparse.

But my question is not about sparsity, but on avoiding redundant computations by using value_and_jacobian and value_gradient_and_hessian judiciously.

In other words, the VectorNonlinearOracle want 3 functions that computes respectively:

  1. The value of \mathbf{f}(\mathbf{x})
  2. The Jacobian of \mathbf{f}(\mathbf{x})
  3. The Hessian of the function \ell(\mathbf{x}, \boldsymbol{\mu}) = \boldsymbol{\mu} \cdot \mathbf{f}(\mathbf{x})

We can compute 1. and 2. efficiently in one single value_and_jacobian call. Now are the value and the gradient of \ell(\mathbf{x}, \boldsymbol{\mu}) completely useless, or they can be used to compute 1. and 2. ? My feeling right now is they are useless since the dot product make it impossible to recuperate the \mathbf{f}(\mathbf{x}) value.

edit : I think @odow should be able to answer my question easily.

To answer my own question here, clearly it does. I had difficulties finding good references on this, but the MATLAB documentation is great on this subject. The Lagrangian of the equality constrained optimization problem above is:

L(\mathbf{x}, \boldsymbol{\mu}) = J(\mathbf{x}) + \boldsymbol{\mu} \cdot \mathbf{f}(\mathbf{x})

meaning that the Hessian of the Langrangian is really:

\nabla^2 L = \nabla^2 J(\mathbf{x}) + \sum_i \mu_i \nabla^2 f_i(\mathbf{x})

I think it would be a good Idea to add this little piece of information in VectorNoninearOracle docstring or somewhere else. Maybe just a sentence e.g. “The Hessian of Langrangian for a given model is the sum of all VectorNonLinearOracle Hessians and all nonlinear operator Hessians.”

Ideally, you also want a multiplier for the objective, which the optimization solver may need to steer (for example in \ell_1 relaxation or feasibility restoration).

1 Like

Yeah, the idea is that you can sometimes reuse the results of a second-order derivative because it computes first-order quantities as a byproduct. Here however all constraints are mixed, which makes this impossible. You could do something if you were to compute the Hessian of each f_i individually, but that would be wasteful.

1 Like

Not sure if I need to chime in with anything here, but:

My understanding is that MOI will do … before passing it to the solver

Correct.

In the Ipopt.jl case, we compute most of the Hessian of the Lagrangian for the objective and any quadratic and scalar nonlinear constraints, and then we add on the terms that correspond to the VectorNonlinearOracle constraints: Ipopt.jl/ext/IpoptMathOptInterfaceExt/MOI_wrapper.jl at 3a7aed2dbc012a83a227dc1726ac791110e5a6f8 · jump-dev/Ipopt.jl · GitHub

Ideally, you also want a multiplier for the objective

The MOI API has this Nonlinear programming · MathOptInterface.

It’s not applicable for VectorNonlinearOracle because there is no objective component, so we just have \mu^\top f(x)

You could do something if you were to compute the Hessian of each individually, but that would be wasteful.

Yeah, the best way is probably to define a new function like

lagrange(x, u) = u' * set.eval_f(x)

and then compute a single Hessian of lagrange, in pseudo-code:

eval_hessian_lagrangian(ret, x, u) = ret .= vec(DI.hessian(x -> lagrange(x, u)))

You’d want to re-use any work for different x and u, but I don’t know how to do that in DI. And you want sparsity if it exists.

2 Likes

Thanks a lot it does help!

About filling ret in the Hessian of the Lagrangian function, should I fill only the upper or lower triangular part? Or it will also work if I fill both part ?

See the docstring: Standard form · MathOptInterface

ret is an AbstractVector{Float64}. You must supply the sparsity structure using the hessian_lagrangian_structure kwarg. Then ret are the corresponding non-zeros. Since the matrix is symmetric, you do not need to provide triangles. Matching elements (i, j) and (j, i will be summed together.

I can make a PR to improve the docstring and make this clearer.

1 Like

What I meant is should I provide e.g. both (1, 2) and (2, 1) in the hessian_lagrangian_structure , or just providing (1, 2) should work ?

Providing (1, 2) is enough

1 Like
1 Like