How to compute the last column of Hessian (over and over) with autodiff without computing entire Hessian?

Hi community, I have an extended question from this post.

I have a loss function f(y, X, β)::Number which requires data y and X. Matrix X changes often and I want to compute:

  1. The last column of the Hessian of f with respect to β (this is the only thing I don’t know how to do yet)
  2. The last entry of the Hessian
  3. The last entry of gradient
  4. The first 5*5 block of the Hessian (solved in previous post)

Can someone teach me how to accomplish (1)? Below I show code for doing items (2)-(4). Importantly, I want to do these without computing the entire gradient/Hessian.


As in the original post, let f(\beta) = \frac{1}{2}\|y - X\beta\|_2^2 be the least squares loss. The analytical gradient is \nabla f(\beta) = -X'(y-X\beta) and analytical Hessian is \nabla^2f(\beta) = X'X

using ForwardDiff
function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
    0.5*sum(abs2, y - X*β)
f(β::AbstractVector) = f(y, X, β)
f(y::AbstractVector, X::AbstractMatrix, β1, β2) = f(y, X, [β1; β2])
f1(β1::AbstractVector) = f(y, X, β1, β2)
f1(β1::Number) = f(y, X, β1, β2)
f2(β2::AbstractVector) = f(y, X, β1, β2)
f2(β2::Number) = f(y, X, β1, β2)

# simulate data
n = 3
p = 5
X = randn(n, p)
y = randn(n)
β = randn(p)
β1 = β[1:end-1]
β2 = β[end]

# analytical gradient and Hessian
grad_true = -X'*(y-X*β)
hess_true = X'*X

(2) Compute last entry of Hessian:

get_hess_last(β2::Number) = ForwardDiff.derivative(β2 -> ForwardDiff.derivative(f2, β2), β2)
@show get_hess_last(β2) ≈ hess_true[end, end]

(3) compute last entry of gradient

get_grad_last(β2::Number) = ForwardDiff.derivative(f2, β2)
@show get_grad_last(β2) ≈ grad_true[end]

(4) compute first (p-1)*(p-1) block of Hessian:

get_hess_first(x::AbstractVector) = ForwardDiff.hessian(f1, x)
@show all(get_hess_first(β1) .≈ hess_true[1:end-1, 1:end-1])

Can someone teach me how to compute ForwardDiff.hessian(f, β)[:, end] without having to compute the entire Hessian?

This discussion provides a way to achieve the desired result via FastDifferentiation.jl. However, I am still interested in knowing how to do so via ForwardDiff.jl. Any help would be highly appreciated.

The hessian is the Jacobian of the gradient, one column of the hessian is thus the Jacobian of your special “one element gradient”, but where you include all variables for the Jacobian computation.

@baggepinnen Thank you for looking into this problem, but I guess I just don’t know how to create the syntax for taking the jacobian of a one element gradient. Could you be a bit more specific?

Here’s, e.g., my attempt that doesn’t work.

julia> ∂Hf(x) = ForwardDiff.jacobian(x -> ForwardDiff.gradient(f2, [x[end]]), x)
julia> ∂Hf(β)

1×5 Matrix{Float64}:
 0.0  0.0  0.0  0.0  3.24298

Actual Hessian:

julia> X'*X
  3.03471   -0.587784  1.43574  0.491986  2.63233
 -0.587784   0.886295  1.14916  0.263948  0.137077
  1.43574    1.14916   3.98298  1.17415   2.96848
  0.491986   0.263948  1.17415  0.362458  0.947429
  2.63233    0.137077  2.96848  0.947429  3.24298
function hessian_column(β)
      element_derivative = function(β)
            β1 = β[1:end-1]
            β2 = β[end]
            ForwardDiff.derivative(β2->f(y,X,β1,β2), β2)
      ForwardDiff.gradient(element_derivative, β)

H1 = ForwardDiff.hessian(f, β)
H2 = hessian_column(β)

using Test
@test H2 ≈ H1[:, end]

julia> @test H2 ≈ H1[:, end]
Test Passed

Key here is that the innermost function element_derivative performs the splitting of β such that the outer call to gradient can pass in the full β. Note that I call gradient instead of Jacobian, since it operates on a scalar function.


Thank you so much. This is tremendously helpful. Please have a nice day.

1 Like