In the context of numerical optimisation, it is good to provide exact Hessians of the problem constraints. I have a working version thanks to Julia’s AD capabilities, but I think my current approach is not optimised for efficiency. I’d be very grateful if someone could give me some tips on how to make it better. Below, I provide more details of my use-case.

Let us consider the following function:

```
function test!(dx, x)
dx[1] = x[1] * x[2] + x[3] * x[4] # constraint 1
dx[2] = x[1] * x[5] * x[6] # constraint 2
dx[3] = x[1] * x[2] + x[3] + x[5] # constraint 3
dx
end
```

`test!`

evaluates three constraints in-place. `dx`

holds the result of each constraint, and `x`

is the vector of decision variables. Let’s make sure it works:

```
julia> test!(zeros(3), rand(6))
3-element Array{Float64,1}:
1.3091968395574483
0.4268997026199477
2.175969622055919
```

Great! Let’s move on. What we want to do is to be able to compute Hessians for functions similar to `test!`

, i.e., functions which take a vector as input and output a vector (not just a scalar). The Hessian of the Lagrangian can be written as

where f is an objective function, \lambda are the Lagrange multipliers, and c are the problem constraints. For this example, let’s not focus on the first term (the one with the objective function f).

In order to compute the Hessian of `test!`

, we can write the following:

```
function hessian(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ForwardDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
```

Let’s test it:

```
julia> hessian(test!, rand(6), rand(3))
6×6 Array{Float64,2}:
0.0 1.01823 0.0 0.0 0.0675784 0.383813
1.01823 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.30392 0.0 0.0
0.0 0.0 0.30392 0.0 0.0 0.0
0.0675784 0.0 0.0 0.0 0.0 0.474181
0.383813 0.0 0.0 0.0 0.474181 0.0
```

Great! The output is exactly what we want: a square and symmetric matrix.

So, what are the issues then? Well, here are the issues I can tell:

- We should try to use
`ForwardDiff.jacobian!`

instead of`ForwardDiff.jacobian`

, and ideally pass it a`JacobianConfig`

. However, when I tried to create a`JacobianConfig`

for the outer differentiation call beforehand, I realised that it needs`extended_gradient`

as one of its arguments, and that would require knowing`lambda`

beforehand too… - We are using forward-mode differentiation over forward-mode, but from other people’s recommendation we should be using mixed-AD. More specifically, for this case we should be using forward over reverse. However, when I benchmarked both approaches, forward over forward is actually faster… Is this because I am using the most straightforward (but not most efficient) calls,
`ForwardDiff.jacobian`

and`ReverseDiff.jacobian`

? (This would then relate to the first item on this list.) - This approach requires creating an
`extended_gradient`

every time`hessian`

is called, in order to use the appropriate`lambda`

. - (Am I missing any more issues?)

So, that’s it! I tried to improve each of the points above at a time, but I got stuck on all of them. I’d really appreciate it if someone could give me a hand here… If you are reading this, thank you for sticking with me this far!

If it can be of any help, here’s a link to the full working code: notebook.

### Additional information

Here are the benchmarks I mention in the second bullet point:

```
function hessian1(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ForwardDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
function hessian2(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ReverseDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
```

```
julia> @btime hessian1(test!, rand(6), rand(3))
1.825 μs (16 allocations: 8.98 KiB)
6×6 Array{Float64,2}:
0.0 0.993312 0.0 0.0 0.484082 0.19054
0.993312 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.720973 0.0 0.0
0.0 0.0 0.720973 0.0 0.0 0.0
0.484082 0.0 0.0 0.0 0.0 0.608634
0.19054 0.0 0.0 0.0 0.608634 0.0
julia> @btime hessian2(test!, rand(6), rand(3))
3.719 μs (84 allocations: 9.94 KiB)
6×6 Array{Float64,2}:
0.0 0.809991 0.0 0.0 0.236583 0.206275
0.809991 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0608866 0.0 0.0
0.0 0.0 0.0608866 0.0 0.0 0.0
0.236583 0.0 0.0 0.0 0.0 0.198622
0.206275 0.0 0.0 0.0 0.198622 0.0
```

I had thought *forward-over-reverse* would be faster than *forward-over-forward*.

And, in case you are wondering about where the `lambda`

comes from: I am using Knitro.jl to solve the NLP problem, which requires the user to define a specific callback where the evaluation of the Hessian takes place. The `lambda`

is passed as an argument to the user-defined callback. (This is not a limitation of Knitro.jl. E.g., Ipopt works in a similar way.) Here’s a simplified example of what that callback would look like:

```
function cb_eval_h_con_dyn(kc, cb, evalRequest, evalResult, userParams)
x = evalRequest.x
lambda = evalRequest.lambda
hess = hessian(test!, x, lambda)
# Assigning only the nonzeros (`hess` is sparse)
evalResult.hess = hess[ind_nonzero]
return 0
end
```