How to compute the last column of Hessian (over and over) with autodiff without computing entire Hessian?

biona001 · September 6, 2023, 9:16pm

Hi community, I have an extended question from this post.

I have a loss function f(y, X, β)::Number which requires data y and X. Matrix X changes often and I want to compute:

The last column of the Hessian of f with respect to β (this is the only thing I don’t know how to do yet)
The last entry of the Hessian
The last entry of gradient
The first 5*5 block of the Hessian (solved in previous post)

Can someone teach me how to accomplish (1)? Below I show code for doing items (2)-(4). Importantly, I want to do these without computing the entire gradient/Hessian.

Examples

As in the original post, let f(\beta) = \frac{1}{2}\|y - X\beta\|_2^2 be the least squares loss. The analytical gradient is \nabla f(\beta) = -X'(y-X\beta) and analytical Hessian is \nabla^2f(\beta) = X'X

using ForwardDiff
function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
    0.5*sum(abs2, y - X*β)
end
f(β::AbstractVector) = f(y, X, β)
f(y::AbstractVector, X::AbstractMatrix, β1, β2) = f(y, X, [β1; β2])
f1(β1::AbstractVector) = f(y, X, β1, β2)
f1(β1::Number) = f(y, X, β1, β2)
f2(β2::AbstractVector) = f(y, X, β1, β2)
f2(β2::Number) = f(y, X, β1, β2)

# simulate data
n = 3
p = 5
X = randn(n, p)
y = randn(n)
β = randn(p)
β1 = β[1:end-1]
β2 = β[end]

# analytical gradient and Hessian
grad_true = -X'*(y-X*β)
hess_true = X'*X

(2) Compute last entry of Hessian:

get_hess_last(β2::Number) = ForwardDiff.derivative(β2 -> ForwardDiff.derivative(f2, β2), β2)
@show get_hess_last(β2) ≈ hess_true[end, end]

(3) compute last entry of gradient

get_grad_last(β2::Number) = ForwardDiff.derivative(f2, β2)
@show get_grad_last(β2) ≈ grad_true[end]

(4) compute first (p-1)*(p-1) block of Hessian:

get_hess_first(x::AbstractVector) = ForwardDiff.hessian(f1, x)
@show all(get_hess_first(β1) .≈ hess_true[1:end-1, 1:end-1])

Can someone teach me how to compute ForwardDiff.hessian(f, β)[:, end] without having to compute the entire Hessian?

biona001 · September 6, 2023, 10:58pm

This discussion provides a way to achieve the desired result via FastDifferentiation.jl. However, I am still interested in knowing how to do so via ForwardDiff.jl. Any help would be highly appreciated.

baggepinnen · September 7, 2023, 4:38am

The hessian is the Jacobian of the gradient, one column of the hessian is thus the Jacobian of your special “one element gradient”, but where you include all variables for the Jacobian computation.

biona001 · September 7, 2023, 5:02am

@baggepinnen Thank you for looking into this problem, but I guess I just don’t know how to create the syntax for taking the jacobian of a one element gradient. Could you be a bit more specific?

Here’s, e.g., my attempt that doesn’t work.

julia> ∂Hf(x) = ForwardDiff.jacobian(x -> ForwardDiff.gradient(f2, [x[end]]), x)
julia> ∂Hf(β)

1×5 Matrix{Float64}:
 0.0  0.0  0.0  0.0  3.24298

Actual Hessian:

julia> X'*X
  3.03471   -0.587784  1.43574  0.491986  2.63233
 -0.587784   0.886295  1.14916  0.263948  0.137077
  1.43574    1.14916   3.98298  1.17415   2.96848
  0.491986   0.263948  1.17415  0.362458  0.947429
  2.63233    0.137077  2.96848  0.947429  3.24298

baggepinnen · September 7, 2023, 5:16am

function hessian_column(β)
      element_derivative = function(β)
            β1 = β[1:end-1]
            β2 = β[end]
            ForwardDiff.derivative(β2->f(y,X,β1,β2), β2)
      end
      ForwardDiff.gradient(element_derivative, β)
end

H1 = ForwardDiff.hessian(f, β)
H2 = hessian_column(β)

using Test
@test H2 ≈ H1[:, end]


julia> @test H2 ≈ H1[:, end]
Test Passed

Key here is that the innermost function element_derivative performs the splitting of β such that the outer call to gradient can pass in the full β. Note that I call gradient instead of Jacobian, since it operates on a scalar function.

biona001 · September 7, 2023, 5:27am

Thank you so much. This is tremendously helpful. Please have a nice day.

Topic		Replies	Views
How to compute Hessian sub-blocks with autodiff? Performance autodiff	7	249	September 4, 2023
Autodiff: calculate just the diagonal of the Hessian General Usage question	6	3060	April 8, 2019
Repeatedly calculating a hessian using both ForwardDiff and ReverseDiff Optimization (Mathematical) question , differentiation	3	1497	February 2, 2021
Computing value and first & second derivatives of a scalar function with scalar input, using ForwardDiff Numerics forwarddiff , autodiff	6	463	August 16, 2023
Mixed partials Performance question , forwarddiff	1	380	October 1, 2021

How to compute the last column of Hessian (over and over) with autodiff without computing entire Hessian?

Examples

Related topics