Evaluating the Hessian of the Lagrangian efficiently

ferrolho · February 4, 2021, 11:04pm

In the context of numerical optimisation, it is good to provide exact Hessians of the problem constraints. I have a working version thanks to Julia’s AD capabilities, but I think my current approach is not optimised for efficiency. I’d be very grateful if someone could give me some tips on how to make it better. Below, I provide more details of my use-case.

Let us consider the following function:

function test!(dx, x)
    dx[1] = x[1] * x[2] + x[3] * x[4]  # constraint 1
    dx[2] = x[1] * x[5] * x[6]         # constraint 2
    dx[3] = x[1] * x[2] + x[3] + x[5]  # constraint 3
    dx
end

test! evaluates three constraints in-place. dx holds the result of each constraint, and x is the vector of decision variables. Let’s make sure it works:

julia> test!(zeros(3), rand(6))
3-element Array{Float64,1}:
 1.3091968395574483
 0.4268997026199477
 2.175969622055919

Great! Let’s move on. What we want to do is to be able to compute Hessians for functions similar to test!, i.e., functions which take a vector as input and output a vector (not just a scalar). The Hessian of the Lagrangian can be written as

\nabla^2 L(x, \lambda) = \sigma \nabla^2 f(x) + \sum_{i=1} \lambda_i \nabla^2 c_i(x)

where f is an objective function, \lambda are the Lagrange multipliers, and c are the problem constraints. For this example, let’s not focus on the first term (the one with the objective function f).

In order to compute the Hessian of test!, we can write the following:

function hessian(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ForwardDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end

Let’s test it:

julia> hessian(test!, rand(6), rand(3))
6×6 Array{Float64,2}:
 0.0        1.01823  0.0      0.0      0.0675784  0.383813
 1.01823    0.0      0.0      0.0      0.0        0.0
 0.0        0.0      0.0      0.30392  0.0        0.0
 0.0        0.0      0.30392  0.0      0.0        0.0
 0.0675784  0.0      0.0      0.0      0.0        0.474181
 0.383813   0.0      0.0      0.0      0.474181   0.0

Great! The output is exactly what we want: a square and symmetric matrix.

So, what are the issues then? Well, here are the issues I can tell:

We should try to use ForwardDiff.jacobian! instead of ForwardDiff.jacobian, and ideally pass it a JacobianConfig. However, when I tried to create a JacobianConfig for the outer differentiation call beforehand, I realised that it needs extended_gradient as one of its arguments, and that would require knowing lambda beforehand too…
We are using forward-mode differentiation over forward-mode, but from other people’s recommendation we should be using mixed-AD. More specifically, for this case we should be using forward over reverse. However, when I benchmarked both approaches, forward over forward is actually faster… Is this because I am using the most straightforward (but not most efficient) calls, ForwardDiff.jacobian and ReverseDiff.jacobian? (This would then relate to the first item on this list.)
This approach requires creating an extended_gradient every time hessian is called, in order to use the appropriate lambda.
(Am I missing any more issues?)

So, that’s it! I tried to improve each of the points above at a time, but I got stuck on all of them. I’d really appreciate it if someone could give me a hand here… If you are reading this, thank you for sticking with me this far!

If it can be of any help, here’s a link to the full working code: notebook.

Additional information

Here are the benchmarks I mention in the second bullet point:

function hessian1(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ForwardDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end

function hessian2(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ReverseDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end

julia> @btime hessian1(test!, rand(6), rand(3))
  1.825 μs (16 allocations: 8.98 KiB)
6×6 Array{Float64,2}:
 0.0       0.993312  0.0       0.0       0.484082  0.19054
 0.993312  0.0       0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.720973  0.0       0.0
 0.0       0.0       0.720973  0.0       0.0       0.0
 0.484082  0.0       0.0       0.0       0.0       0.608634
 0.19054   0.0       0.0       0.0       0.608634  0.0

julia> @btime hessian2(test!, rand(6), rand(3))
  3.719 μs (84 allocations: 9.94 KiB)
6×6 Array{Float64,2}:
 0.0       0.809991  0.0        0.0        0.236583  0.206275
 0.809991  0.0       0.0        0.0        0.0       0.0
 0.0       0.0       0.0        0.0608866  0.0       0.0
 0.0       0.0       0.0608866  0.0        0.0       0.0
 0.236583  0.0       0.0        0.0        0.0       0.198622
 0.206275  0.0       0.0        0.0        0.198622  0.0

I had thought forward-over-reverse would be faster than forward-over-forward.

And, in case you are wondering about where the lambda comes from: I am using Knitro.jl to solve the NLP problem, which requires the user to define a specific callback where the evaluation of the Hessian takes place. The lambda is passed as an argument to the user-defined callback. (This is not a limitation of Knitro.jl. E.g., Ipopt works in a similar way.) Here’s a simplified example of what that callback would look like:

function cb_eval_h_con_dyn(kc, cb, evalRequest, evalResult, userParams)
    x = evalRequest.x
    lambda = evalRequest.lambda

    hess = hessian(test!, x, lambda)

    # Assigning only the nonzeros (`hess` is sparse)
    evalResult.hess = hess[ind_nonzero]

    return 0
end

contradict · February 5, 2021, 12:44am

Maybe ModelingToolkit.jl would be helpful? For a system like this that consists of simple equations, it can do symbolic operations rather than AD.

using ModelingToolkit

@parameters t y[1:3] λ[1:3]
@variables x[1:3](t)
D = Differential(t)

eqs = [
    D(x[1]) ~ x[1] * x[2] + x[3] * y[1],  # constraint 1
    D(x[2]) ~ x[1] * y[2] * y[3],         # constraint 2
    D(x[3]) ~ x[1] * x[2] + x[3] + y[2],  # constraint 3
]

ℒ = sum(λ.*ModelingToolkit.rhss(eqs))

ℋ = ModelingToolkit.hessian(ℒ, [x;y])

Produces

6×6 Matrix{Num}:
       0  λ₁ + λ₃   0   0     y₃*λ₂     y₂*λ₂
 λ₁ + λ₃        0   0   0         0         0
       0        0   0  λ₁         0         0
       0        0  λ₁   0         0         0
   y₃*λ₂        0   0   0         0  λ₂*x₁(t)
   y₂*λ₂        0   0   0  λ₂*x₁(t)         0

Which can be compiled to a function

hess, _ = ModelingToolkit.build_function(ℋ, [x;y;λ]; expression=Val{false})
hess(rand(9))

6×6 Matrix{Float64}:
 0.0        1.56516  0.0       0.0       0.0490337   0.0661009
 1.56516    0.0      0.0       0.0       0.0         0.0
 0.0        0.0      0.0       0.927298  0.0         0.0
 0.0        0.0      0.927298  0.0       0.0         0.0
 0.0490337  0.0      0.0       0.0       0.0         0.00383799
 0.0661009  0.0      0.0       0.0       0.00383799  0.0

odow · February 5, 2021, 1:27am

I am using Knitro.jl to solve the NLP problem

If you can write the constraints algebraically, use JuMP with Knitro. We will compute the hessian using a sparse reverse-mode AD.

This won’t work if your constraints are user-defined functions, because we disable the hessians.

Edit: I just looked at your notebook. Here’s a similar example of an NLP for optimal control of a rocket: https://nbviewer.jupyter.org/github/jump-dev/JuMPTutorials.jl/blob/master/notebook/modelling/rocket_control.ipynb

ferrolho · February 5, 2021, 8:11am

Thank you for your suggestion, @odow. I am looking for a solution that supports user-defined functions, so JuMP is not a good fit for what I need to do. I tried it before; see here. (Where you also replied! Thank you.)

I think JuMP would work for the notebook I linked, as the constraints can be split easily. (And I will probably do it as a nice exercise.) But the Space Shuttle notebook I have is just a toy-problem to test things out. What I am really looking for is a more general solution, since I want to apply this to the robotics domain, where it would be impractical to write down my constraints with JuMP.

ferrolho · February 5, 2021, 8:33am

Thank you for your reply, @contradict. Indeed, Chris Rackauckas has suggested ModelingToolkit.jl to me before. To be honest, I haven’t tried it yet because I spent a considerable amount of time building the framework I am currently using, and I guess (maybe wrongly) that it would take a lot of effort to start using ModelingToolkit.jl at this point. Perhaps it is time I finally give it a go.

In any case, I am still interested in knowing how to improve my current approach without ModelingToolkit.jl.

ChrisRackauckas · February 5, 2021, 11:56am

Just let it automatically convert your code to symbolic? Zero work should be sufficient?

odow · February 5, 2021, 9:10pm

There are a few packages for optimal control in robotics:

ferrolho · February 5, 2021, 10:58pm

I am familiar with the links in your reply, @odow. TrajectoryOptimization.jl does not allow loading robot models from URDF files as mentioned in this issue, which kind of circles back to the question I asked above. As for JuliaRobotics, there’s only one package for optimal control (TORA.jl) and I am the author.

ferrolho · February 9, 2021, 11:25pm

Hi, Chris! Thank you for chipping in.

Is this what you are referring to? JuliaCon 2020 | Auto-Optimization and Parallelism in DifferentialEquations.jl | Chris Rackauckas - YouTube (From t = 14:12 onwards.) I.e., running symbolic inputs through my already-existing methods to compute symbolic versions of those methods; and after which I’d be able to ask MTK.jl for the Hessians? (And MTK.jl would compute them with the approach it thinks is the most appropriate/efficient?)

ChrisRackauckas · February 20, 2021, 3:48am

Yes exactly. It’ll do the conversion automatically and then you can symbolically generate the Hessian code, so there shouldn’t be any work on your part.

adeyemiadeoye · January 16, 2022, 2:39pm

Hi @contradict, could you please help point to what has now replaced ModelingToolkit.rhss(eqs)?

Thanks.

contradict · January 16, 2022, 6:10pm

I think it has migrated to Symbolics.jl, try ModelingToolkit.Symbolics.rhss or using Symbolics directly.

adeyemiadeoye · January 16, 2022, 6:42pm

Thanks a lot, @contradict.
I managed to create something from ModelingToolkit.jl for the (Lagrangian) problem at hand. But I seek to build a Hessian function that takes some defined function as arguments. In particular, how would you “correctly” write the following using the same idea contained in your previous answer?

@parameters λ
@variables x, y, vec1, vec2, func1(x, y, vec1), func2(x, vec2, vec1)

ℒ = func1 + sum(λ .* func2)
ℋ = ModelingToolkit.hessian(ℒ, [vec1; λ])

hess,_ = ModelingToolkit.build_function(ℋ, [func1;func2;λ]; expression=Val{false})
hess(obj(inp1,inp2,inp3), constr(inp4, inp5, inp6), lbda)

Where obj outputs the objective value, and constr outputs the constraints vector.

Thanks again.

contradict · January 18, 2022, 2:50am

This looks like you may have diverged a bit from the original question. Would you mind creating a new post with a more complete example? Hopefully something I can run and see the error that is giving you trouble.

adeyemiadeoye · January 27, 2022, 9:49am

Thanks a lot. I got some answers to the new topic created.

Topic		Replies	Views
Building a symbolic Hessian function with ModelingToolkit.jl General Usage question , modelingtoolkit	9	691	January 27, 2022
Optim; Hessian or Hessian free? Optimization (Mathematical) package	15	1592	July 11, 2019
Is the Hessian used in non linear JuMP? Optimization (Mathematical) question , jump	3	987	June 21, 2020
JuMP interface access to automatic differentiation (ReverseDiffSparse) Optimization (Mathematical) jump , differentiation	5	2666	February 15, 2018
JuMP, large expression, and hessian "limited-memory" option in Ipopt Optimization (Mathematical) question , jump	8	555	April 9, 2024

Evaluating the Hessian of the Lagrangian efficiently

Additional information

Related topics