Evaluating the Hessian of the Lagrangian efficiently

In the context of numerical optimisation, it is good to provide exact Hessians of the problem constraints. I have a working version thanks to Julia’s AD capabilities, but I think my current approach is not optimised for efficiency. I’d be very grateful if someone could give me some tips on how to make it better. Below, I provide more details of my use-case.

Let us consider the following function:

function test!(dx, x)
    dx[1] = x[1] * x[2] + x[3] * x[4]  # constraint 1
    dx[2] = x[1] * x[5] * x[6]         # constraint 2
    dx[3] = x[1] * x[2] + x[3] + x[5]  # constraint 3
    dx
end

test! evaluates three constraints in-place. dx holds the result of each constraint, and x is the vector of decision variables. Let’s make sure it works:

julia> test!(zeros(3), rand(6))
3-element Array{Float64,1}:
 1.3091968395574483
 0.4268997026199477
 2.175969622055919

Great! Let’s move on. What we want to do is to be able to compute Hessians for functions similar to test!, i.e., functions which take a vector as input and output a vector (not just a scalar). The Hessian of the Lagrangian can be written as

\nabla^2 L(x, \lambda) = \sigma \nabla^2 f(x) + \sum_{i=1} \lambda_i \nabla^2 c_i(x)

where f is an objective function, \lambda are the Lagrange multipliers, and c are the problem constraints. For this example, let’s not focus on the first term (the one with the objective function f).

In order to compute the Hessian of test!, we can write the following:

function hessian(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ForwardDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end

Let’s test it:

julia> hessian(test!, rand(6), rand(3))
6×6 Array{Float64,2}:
 0.0        1.01823  0.0      0.0      0.0675784  0.383813
 1.01823    0.0      0.0      0.0      0.0        0.0
 0.0        0.0      0.0      0.30392  0.0        0.0
 0.0        0.0      0.30392  0.0      0.0        0.0
 0.0675784  0.0      0.0      0.0      0.0        0.474181
 0.383813   0.0      0.0      0.0      0.474181   0.0

Great! The output is exactly what we want: a square and symmetric matrix.

So, what are the issues then? Well, here are the issues I can tell:

  • We should try to use ForwardDiff.jacobian! instead of ForwardDiff.jacobian, and ideally pass it a JacobianConfig. However, when I tried to create a JacobianConfig for the outer differentiation call beforehand, I realised that it needs extended_gradient as one of its arguments, and that would require knowing lambda beforehand too…
  • We are using forward-mode differentiation over forward-mode, but from other people’s recommendation we should be using mixed-AD. More specifically, for this case we should be using forward over reverse. However, when I benchmarked both approaches, forward over forward is actually faster… Is this because I am using the most straightforward (but not most efficient) calls, ForwardDiff.jacobian and ReverseDiff.jacobian? (This would then relate to the first item on this list.)
  • This approach requires creating an extended_gradient every time hessian is called, in order to use the appropriate lambda.
  • (Am I missing any more issues?)

So, that’s it! I tried to improve each of the points above at a time, but I got stuck on all of them. I’d really appreciate it if someone could give me a hand here… If you are reading this, thank you for sticking with me this far! :slight_smile:

If it can be of any help, here’s a link to the full working code: notebook.

Additional information


Here are the benchmarks I mention in the second bullet point:

function hessian1(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ForwardDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end

function hessian2(f!, x, lambda)
    function extended_gradient(x::AbstractArray{T}) where {T}
        dx = zeros(T, 3)
        ReverseDiff.jacobian(f!, dx, x)' * lambda
    end

    ForwardDiff.jacobian(extended_gradient, x)
end
julia> @btime hessian1(test!, rand(6), rand(3))
  1.825 μs (16 allocations: 8.98 KiB)
6×6 Array{Float64,2}:
 0.0       0.993312  0.0       0.0       0.484082  0.19054
 0.993312  0.0       0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.720973  0.0       0.0
 0.0       0.0       0.720973  0.0       0.0       0.0
 0.484082  0.0       0.0       0.0       0.0       0.608634
 0.19054   0.0       0.0       0.0       0.608634  0.0

julia> @btime hessian2(test!, rand(6), rand(3))
  3.719 μs (84 allocations: 9.94 KiB)
6×6 Array{Float64,2}:
 0.0       0.809991  0.0        0.0        0.236583  0.206275
 0.809991  0.0       0.0        0.0        0.0       0.0
 0.0       0.0       0.0        0.0608866  0.0       0.0
 0.0       0.0       0.0608866  0.0        0.0       0.0
 0.236583  0.0       0.0        0.0        0.0       0.198622
 0.206275  0.0       0.0        0.0        0.198622  0.0

I had thought forward-over-reverse would be faster than forward-over-forward.


And, in case you are wondering about where the lambda comes from: I am using Knitro.jl to solve the NLP problem, which requires the user to define a specific callback where the evaluation of the Hessian takes place. The lambda is passed as an argument to the user-defined callback. (This is not a limitation of Knitro.jl. E.g., Ipopt works in a similar way.) Here’s a simplified example of what that callback would look like:

function cb_eval_h_con_dyn(kc, cb, evalRequest, evalResult, userParams)
    x = evalRequest.x
    lambda = evalRequest.lambda

    hess = hessian(test!, x, lambda)

    # Assigning only the nonzeros (`hess` is sparse)
    evalResult.hess = hess[ind_nonzero]

    return 0
end
2 Likes

Maybe ModelingToolkit.jl would be helpful? For a system like this that consists of simple equations, it can do symbolic operations rather than AD.

using ModelingToolkit

@parameters t y[1:3] λ[1:3]
@variables x[1:3](t)
D = Differential(t)

eqs = [
    D(x[1]) ~ x[1] * x[2] + x[3] * y[1],  # constraint 1
    D(x[2]) ~ x[1] * y[2] * y[3],         # constraint 2
    D(x[3]) ~ x[1] * x[2] + x[3] + y[2],  # constraint 3
]

ℒ = sum(λ.*ModelingToolkit.rhss(eqs))

ℋ = ModelingToolkit.hessian(ℒ, [x;y])

Produces

6×6 Matrix{Num}:
       0  λ₁ + λ₃   0   0     y₃*λ₂     y₂*λ₂
 λ₁ + λ₃        0   0   0         0         0
       0        0   0  λ₁         0         0
       0        0  λ₁   0         0         0
   y₃*λ₂        0   0   0         0  λ₂*x₁(t)
   y₂*λ₂        0   0   0  λ₂*x₁(t)         0

Which can be compiled to a function

hess, _ = ModelingToolkit.build_function(ℋ, [x;y;λ]; expression=Val{false})
hess(rand(9))

6×6 Matrix{Float64}:
 0.0        1.56516  0.0       0.0       0.0490337   0.0661009
 1.56516    0.0      0.0       0.0       0.0         0.0
 0.0        0.0      0.0       0.927298  0.0         0.0
 0.0        0.0      0.927298  0.0       0.0         0.0
 0.0490337  0.0      0.0       0.0       0.0         0.00383799
 0.0661009  0.0      0.0       0.0       0.00383799  0.0
1 Like

I am using Knitro.jl to solve the NLP problem

If you can write the constraints algebraically, use JuMP with Knitro. We will compute the hessian using a sparse reverse-mode AD.

This won’t work if your constraints are user-defined functions, because we disable the hessians.

Edit: I just looked at your notebook. Here’s a similar example of an NLP for optimal control of a rocket: Jupyter Notebook Viewer

2 Likes

Thank you for your suggestion, @odow. I am looking for a solution that supports user-defined functions, so JuMP is not a good fit for what I need to do. I tried it before; see here. (Where you also replied! Thank you.) :slight_smile:

I think JuMP would work for the notebook I linked, as the constraints can be split easily. (And I will probably do it as a nice exercise.) But the Space Shuttle notebook I have is just a toy-problem to test things out. What I am really looking for is a more general solution, since I want to apply this to the robotics domain, where it would be impractical to write down my constraints with JuMP.

Thank you for your reply, @contradict. Indeed, Chris Rackauckas has suggested ModelingToolkit.jl to me before. To be honest, I haven’t tried it yet because I spent a considerable amount of time building the framework I am currently using, and I guess (maybe wrongly) that it would take a lot of effort to start using ModelingToolkit.jl at this point. Perhaps it is time I finally give it a go.

In any case, I am still interested in knowing how to improve my current approach without ModelingToolkit.jl.

Just let it automatically convert your code to symbolic? Zero work should be sufficient?

There are a few packages for optimal control in robotics:

I am familiar with the links in your reply, @odow. TrajectoryOptimization.jl does not allow loading robot models from URDF files as mentioned in this issue, which kind of circles back to the question I asked above. As for JuliaRobotics, there’s only one package for optimal control (TORA.jl) and I am the author.

2 Likes

Hi, Chris! Thank you for chipping in.

Is this what you are referring to? https://youtu.be/UNkXNZZ3hSw?t=852 (From t = 14:12 onwards.) I.e., running symbolic inputs through my already-existing methods to compute symbolic versions of those methods; and after which I’d be able to ask MTK.jl for the Hessians? (And MTK.jl would compute them with the approach it thinks is the most appropriate/efficient?)

Yes exactly. It’ll do the conversion automatically and then you can symbolically generate the Hessian code, so there shouldn’t be any work on your part.