# How to compute Hessian sub-blocks with autodiff?

Hi community,

I’m trying to use ForwardDiff.jl to compute a Hessian for a log likelihood function. Computing the full Hessian takes a long time, and I’m only interested in a sub-block of the Hessian anyway. Is there any way to compute a “sub-block” of this Hessian without computing the entire thing?

Here’s a toy example. Let f(\beta) = \frac{1}{2}\|y - X\beta\|_2^2 be the least squares loss. The analysitical Hessian is \nabla^2f(x) = X'X. If X is n \times p, then the Hessian is p \times p which could be large when p is large. If I’m only interested in the first 5 \times 5 block, how can I get it?

Code:

using ForwardDiff
function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
0.5*sum(abs2, y - X*β)
end
f(β::AbstractVector) = f(y, X, β)

# simulate data
n = 3
p = 50
X = randn(n, p)
y = randn(n)
β = randn(p)
@show f(β)

# analytical Hessian
@show X'*X

# autodiff Hessian (p by p)
∇²f = x -> ForwardDiff.hessian(f, x)
@show ∇²f(β)


How can I get ∇²f(β)[1:5, 1:5] without computing ∇²f(β)[6:end, 6:end]?

1 Like

A partial answer (hope someone has a better one) that only avoids computing ∇²f(β)[6:end, :] is

julia> const R = 1:5
1:5

G (generic function with 1 method)

julia> H(x) = ForwardDiff.jacobian(G, x)[:,R]
H (generic function with 1 method)

julia> H(β) - (X'*X)[R,R]
5×5 Matrix{Float64}:
0.0           0.0          1.38778e-17  -2.22045e-16  -4.16334e-17
0.0          -1.11022e-16  0.0          -2.77556e-17   0.0
1.38778e-17   0.0          2.77556e-17   0.0           0.0
-2.22045e-16  -2.77556e-17  0.0           0.0          -5.55112e-17
-4.16334e-17   0.0          0.0          -5.55112e-17   0.0


You might find SparseDiffTools.jl useful if you have sparsity.

You can try something like

ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5])


i.e., fix the other variables and differentiate wrt the first 5. Possibly with @views.

This assumes that you don’t want or can’t use X[:,1:5]' * X[:,1:5] directly, which should be simpler and faster.

4 Likes

@abelsiqueira your suggestion makes sense to me, but it seems to need “compiling” everytime I run it?

julia> @time ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5]);

0.846585 seconds (2.78 M allocations: 181.728 MiB, 3.49% gc time, 99.55% compilation time)

julia> @time ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5]);
0.837404 seconds (2.78 M allocations: 181.686 MiB, 3.57% gc time, 99.56% compilation time)


As you can see, 99% of time is spent on compilation in every @time call. Do you have any idea why? If we can’t avoid compilation, computing the full Hessian with autodiff is actually much faster

julia> @time ∇²f(β);
0.000904 seconds (90 allocations: 298.297 KiB)


Using views should help: The @view and @views macros: are you sure you know how they work? | Blog by Bogumił Kamiński

It seems @views  is not helping much

julia> @time ForwardDiff.hessian(βsub -> f([βsub; @view(β[6:end])]), @view(β[1:5]));
0.788638 seconds (2.54 M allocations: 167.458 MiB, 5.49% gc time, 99.74% compilation time)

julia> @time ForwardDiff.hessian(βsub -> f([βsub; @view(β[6:end])]), @view(β[1:5]));
0.796364 seconds (2.54 M allocations: 167.530 MiB, 4.17% gc time, 99.72% compilation time)


I think the main issue is compilation time. Also, I don’t see how I can avoid allocating [βsub; β[6:end] with @views? Could you be a bit more specific?

Thank you for your time, and sorry if I’m not understanding something basic.

Edit:
I got rid of the extra allocations and compilations by defining f2 which manually partitions the input vector into 2 pieces, and differentiating with respect to only the first piece.

# here β = [β1; β2]
function f2(y::AbstractVector, X::AbstractMatrix, β1::AbstractVector, β2::AbstractVector)
0.5*sum(abs2, y - X*[β1; β2])
end
f2(β1::AbstractVector) = f2(y, X, β1, β2)

∇²f2 = x -> ForwardDiff.hessian(f2, x)

β1 = β[1:5]
β2 = β[6:end]
@time ∇²f2(β1) # 0.000303 seconds (14 allocations: 290.797 KiB)
all(∇²f2(β1) .≈ ∇²f(β)[1:5, 1:5]) # true


Thank you again for your help, very much appreciated.

1 Like

Happy to help, glad to know it worked with the modifications.

If I’m understanding what you intend then you can do this using FastDifferentiation

module Analytical
using FastDifferentiation
using BenchmarkTools

function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
0.5*sum(abs2, y - X*β)
end

# simulate data
n = 3
p = 50
βsym = make_variables(:βsym, p)
X = randn(n, p)
y = randn(n)
β = randn(p)

f(β::AbstractVector) = f(y, X, β)

function show_res()
k = 5
hess = hessian(f(βsym),βsym[1:k])
hess_func! = make_function(hess,βsym,in_place=true)
# @show f(β)

XX = (X'*X)[1:5,1:5]
# analytical Hessian
@show XX

resmat = similar(hess,Float64)
hess_func!(resmat,β)
@show resmat

@show sum(abs.(XX-resmat))

@benchmark $hess_func!($resmat,\$β)
end
export show_res
end # module Analytical



Here’s the output. I used the in_place=true keyword argument so that hess_func! doesn’t allocate.

julia> show_res()
XX = [2.86749744138595 -0.6004909386526686 -1.1123983182855144 1.0337505040259434 0.29177138544566084; -0.6004909386526686 9.065653688937276 -2.289940217909246 -2.1947415284286453 0.7768571507417126; -1.1123983182855144 -2.289940217909246 1.4173807474637325 0.468163520049563 -1.358271008725635; 1.0337505040259434 -2.1947415284286453 0.468163520049563 1.1633991781264883 -1.2252714278612105; 0.29177138544566084 0.7768571507417126 -1.358271008725635 -1.2252714278612105 3.822718768576883]
resmat = [2.86749744138595 -0.6004909386526684 -1.1123983182855144 1.0337505040259436 0.2917713854456607; -0.6004909386526684 9.065653688937276 -2.289940217909246 -2.1947415284286453 0.7768571507417126; -1.1123983182855144 -2.289940217909246 1.4173807474637325 0.4681635200495631 -1.3582710087256353; 1.0337505040259436 -2.1947415284286453 0.4681635200495631 1.163399178126488 -1.2252714278612105; 0.2917713854456607 0.7768571507417126 -1.3582710087256353 -1.2252714278612105 3.822718768576883]
sum(abs.(XX - resmat)) = 1.6653345369377348e-15
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max):   9.710 ns … 34.835 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     10.310 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   10.347 ns ±  0.534 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▅   █
▂▁▁▁▇▁▁▁▄▁▁▁▇▁▁▁▄▁▁▁▁█▁▁▁█▁▁▁▄▁▁▁█▁▁▁▇▁▁▁▁▄▁▁▁▅▁▁▁▄▁▁▁▃▁▁▁▂ ▃
9.71 ns         Histogram: frequency by time        11.1 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

1 Like