In the context of numerical optimisation, it is good to provide exact Hessians of the problem constraints. I have a working version thanks to Julia’s AD capabilities, but I think my current approach is not optimised for efficiency. I’d be very grateful if someone could give me some tips on how to make it better. Below, I provide more details of my use-case.
Let us consider the following function:
function test!(dx, x)
dx[1] = x[1] * x[2] + x[3] * x[4] # constraint 1
dx[2] = x[1] * x[5] * x[6] # constraint 2
dx[3] = x[1] * x[2] + x[3] + x[5] # constraint 3
dx
end
test!
evaluates three constraints in-place. dx
holds the result of each constraint, and x
is the vector of decision variables. Let’s make sure it works:
julia> test!(zeros(3), rand(6))
3-element Array{Float64,1}:
1.3091968395574483
0.4268997026199477
2.175969622055919
Great! Let’s move on. What we want to do is to be able to compute Hessians for functions similar to test!
, i.e., functions which take a vector as input and output a vector (not just a scalar). The Hessian of the Lagrangian can be written as
where f is an objective function, \lambda are the Lagrange multipliers, and c are the problem constraints. For this example, let’s not focus on the first term (the one with the objective function f).
In order to compute the Hessian of test!
, we can write the following:
function hessian(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ForwardDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
Let’s test it:
julia> hessian(test!, rand(6), rand(3))
6×6 Array{Float64,2}:
0.0 1.01823 0.0 0.0 0.0675784 0.383813
1.01823 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.30392 0.0 0.0
0.0 0.0 0.30392 0.0 0.0 0.0
0.0675784 0.0 0.0 0.0 0.0 0.474181
0.383813 0.0 0.0 0.0 0.474181 0.0
Great! The output is exactly what we want: a square and symmetric matrix.
So, what are the issues then? Well, here are the issues I can tell:
- We should try to use
ForwardDiff.jacobian!
instead ofForwardDiff.jacobian
, and ideally pass it aJacobianConfig
. However, when I tried to create aJacobianConfig
for the outer differentiation call beforehand, I realised that it needsextended_gradient
as one of its arguments, and that would require knowinglambda
beforehand too… - We are using forward-mode differentiation over forward-mode, but from other people’s recommendation we should be using mixed-AD. More specifically, for this case we should be using forward over reverse. However, when I benchmarked both approaches, forward over forward is actually faster… Is this because I am using the most straightforward (but not most efficient) calls,
ForwardDiff.jacobian
andReverseDiff.jacobian
? (This would then relate to the first item on this list.) - This approach requires creating an
extended_gradient
every timehessian
is called, in order to use the appropriatelambda
. - (Am I missing any more issues?)
So, that’s it! I tried to improve each of the points above at a time, but I got stuck on all of them. I’d really appreciate it if someone could give me a hand here… If you are reading this, thank you for sticking with me this far!
If it can be of any help, here’s a link to the full working code: notebook.
Additional information
Here are the benchmarks I mention in the second bullet point:
function hessian1(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ForwardDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
function hessian2(f!, x, lambda)
function extended_gradient(x::AbstractArray{T}) where {T}
dx = zeros(T, 3)
ReverseDiff.jacobian(f!, dx, x)' * lambda
end
ForwardDiff.jacobian(extended_gradient, x)
end
julia> @btime hessian1(test!, rand(6), rand(3))
1.825 μs (16 allocations: 8.98 KiB)
6×6 Array{Float64,2}:
0.0 0.993312 0.0 0.0 0.484082 0.19054
0.993312 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.720973 0.0 0.0
0.0 0.0 0.720973 0.0 0.0 0.0
0.484082 0.0 0.0 0.0 0.0 0.608634
0.19054 0.0 0.0 0.0 0.608634 0.0
julia> @btime hessian2(test!, rand(6), rand(3))
3.719 μs (84 allocations: 9.94 KiB)
6×6 Array{Float64,2}:
0.0 0.809991 0.0 0.0 0.236583 0.206275
0.809991 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0608866 0.0 0.0
0.0 0.0 0.0608866 0.0 0.0 0.0
0.236583 0.0 0.0 0.0 0.0 0.198622
0.206275 0.0 0.0 0.0 0.198622 0.0
I had thought forward-over-reverse would be faster than forward-over-forward.
And, in case you are wondering about where the lambda
comes from: I am using Knitro.jl to solve the NLP problem, which requires the user to define a specific callback where the evaluation of the Hessian takes place. The lambda
is passed as an argument to the user-defined callback. (This is not a limitation of Knitro.jl. E.g., Ipopt works in a similar way.) Here’s a simplified example of what that callback would look like:
function cb_eval_h_con_dyn(kc, cb, evalRequest, evalResult, userParams)
x = evalRequest.x
lambda = evalRequest.lambda
hess = hessian(test!, x, lambda)
# Assigning only the nonzeros (`hess` is sparse)
evalResult.hess = hess[ind_nonzero]
return 0
end