How to compute Hessian sub-blocks with autodiff?

biona001 · September 4, 2023, 3:16am

Hi community,

I’m trying to use ForwardDiff.jl to compute a Hessian for a log likelihood function. Computing the full Hessian takes a long time, and I’m only interested in a sub-block of the Hessian anyway. Is there any way to compute a “sub-block” of this Hessian without computing the entire thing?

Here’s a toy example. Let f(\beta) = \frac{1}{2}\|y - X\beta\|_2^2 be the least squares loss. The analysitical Hessian is \nabla^2f(x) = X'X. If X is n \times p, then the Hessian is p \times p which could be large when p is large. If I’m only interested in the first 5 \times 5 block, how can I get it?

Code:

using ForwardDiff
function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
    0.5*sum(abs2, y - X*β)
end
f(β::AbstractVector) = f(y, X, β)

# simulate data
n = 3
p = 50
X = randn(n, p)
y = randn(n)
β = randn(p)
@show f(β)

# analytical Hessian
@show X'*X

# autodiff Hessian (p by p)
∇²f = x -> ForwardDiff.hessian(f, x)
@show ∇²f(β)

How can I get ∇²f(β)[1:5, 1:5] without computing ∇²f(β)[6:end, 6:end]?

jd-foster · September 4, 2023, 9:39am

A partial answer (hope someone has a better one) that only avoids computing ∇²f(β)[6:end, :] is

julia> const R = 1:5
1:5

julia> G(x) = ForwardDiff.gradient(f, x)[R]
G (generic function with 1 method)

julia> H(x) = ForwardDiff.jacobian(G, x)[:,R]
H (generic function with 1 method)

julia> H(β) - (X'*X)[R,R]
5×5 Matrix{Float64}:
  0.0           0.0          1.38778e-17  -2.22045e-16  -4.16334e-17
  0.0          -1.11022e-16  0.0          -2.77556e-17   0.0
  1.38778e-17   0.0          2.77556e-17   0.0           0.0
 -2.22045e-16  -2.77556e-17  0.0           0.0          -5.55112e-17
 -4.16334e-17   0.0          0.0          -5.55112e-17   0.0

You might find SparseDiffTools.jl useful if you have sparsity.

abelsiqueira · September 4, 2023, 11:53am

You can try something like

ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5])

i.e., fix the other variables and differentiate wrt the first 5. Possibly with @views.

This assumes that you don’t want or can’t use X[:,1:5]' * X[:,1:5] directly, which should be simpler and faster.

biona001 · September 4, 2023, 5:59pm

@abelsiqueira your suggestion makes sense to me, but it seems to need “compiling” everytime I run it?

julia> @time ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5]);

  0.846585 seconds (2.78 M allocations: 181.728 MiB, 3.49% gc time, 99.55% compilation time)

julia> @time ForwardDiff.hessian(βsub -> f([βsub; β[6:end]]), β[1:5]);
  0.837404 seconds (2.78 M allocations: 181.686 MiB, 3.57% gc time, 99.56% compilation time)

As you can see, 99% of time is spent on compilation in every @time call. Do you have any idea why? If we can’t avoid compilation, computing the full Hessian with autodiff is actually much faster

julia> @time ∇²f(β);
  0.000904 seconds (90 allocations: 298.297 KiB)

abelsiqueira · September 4, 2023, 6:20pm

Using views should help: The @view and @views macros: are you sure you know how they work? | Blog by Bogumił Kamiński

biona001 · September 4, 2023, 7:36pm

~~It seems @views is not helping much~~

julia> @time ForwardDiff.hessian(βsub -> f([βsub; @view(β[6:end])]), @view(β[1:5]));
  0.788638 seconds (2.54 M allocations: 167.458 MiB, 5.49% gc time, 99.74% compilation time)

julia> @time ForwardDiff.hessian(βsub -> f([βsub; @view(β[6:end])]), @view(β[1:5]));
  0.796364 seconds (2.54 M allocations: 167.530 MiB, 4.17% gc time, 99.72% compilation time)

~~I think the main issue is compilation time. Also, I don’t see how I can avoid allocating [βsub; β[6:end] with @views? Could you be a bit more specific?~~

~~Thank you for your time, and sorry if I’m not understanding something basic.~~

Edit:
I got rid of the extra allocations and compilations by defining f2 which manually partitions the input vector into 2 pieces, and differentiating with respect to only the first piece.

# here β = [β1; β2]
function f2(y::AbstractVector, X::AbstractMatrix, β1::AbstractVector, β2::AbstractVector)
    0.5*sum(abs2, y - X*[β1; β2])
end
f2(β1::AbstractVector) = f2(y, X, β1, β2)

∇²f2 = x -> ForwardDiff.hessian(f2, x)

β1 = β[1:5]
β2 = β[6:end]
@time ∇²f2(β1) # 0.000303 seconds (14 allocations: 290.797 KiB)
all(∇²f2(β1) .≈ ∇²f(β)[1:5, 1:5]) # true

Thank you again for your help, very much appreciated.

abelsiqueira · September 4, 2023, 8:12pm

Happy to help, glad to know it worked with the modifications.

brianguenter · September 4, 2023, 10:23pm

If I’m understanding what you intend then you can do this using FastDifferentiation

module Analytical
using FastDifferentiation
using BenchmarkTools

function f(y::AbstractVector, X::AbstractMatrix, β::AbstractVector)
    0.5*sum(abs2, y - X*β)
end

# simulate data
n = 3
p = 50
βsym = make_variables(:βsym, p)
X = randn(n, p)
y = randn(n)
β = randn(p)

f(β::AbstractVector) = f(y, X, β)

function show_res()
    k = 5
     hess = hessian(f(βsym),βsym[1:k])
     hess_func! = make_function(hess,βsym,in_place=true)
# @show f(β)

XX = (X'*X)[1:5,1:5]
# analytical Hessian
@show XX

resmat = similar(hess,Float64)
hess_func!(resmat,β)
@show resmat

@show sum(abs.(XX-resmat))

@benchmark $hess_func!($resmat,$β)
end
export show_res
end # module Analytical

Here’s the output. I used the in_place=true keyword argument so that hess_func! doesn’t allocate.

julia> show_res()
XX = [2.86749744138595 -0.6004909386526686 -1.1123983182855144 1.0337505040259434 0.29177138544566084; -0.6004909386526686 9.065653688937276 -2.289940217909246 -2.1947415284286453 0.7768571507417126; -1.1123983182855144 -2.289940217909246 1.4173807474637325 0.468163520049563 -1.358271008725635; 1.0337505040259434 -2.1947415284286453 0.468163520049563 1.1633991781264883 -1.2252714278612105; 0.29177138544566084 0.7768571507417126 -1.358271008725635 -1.2252714278612105 3.822718768576883]
resmat = [2.86749744138595 -0.6004909386526684 -1.1123983182855144 1.0337505040259436 0.2917713854456607; -0.6004909386526684 9.065653688937276 -2.289940217909246 -2.1947415284286453 0.7768571507417126; -1.1123983182855144 -2.289940217909246 1.4173807474637325 0.4681635200495631 -1.3582710087256353; 1.0337505040259436 -2.1947415284286453 0.4681635200495631 1.163399178126488 -1.2252714278612105; 0.2917713854456607 0.7768571507417126 -1.3582710087256353 -1.2252714278612105 3.822718768576883]
sum(abs.(XX - resmat)) = 1.6653345369377348e-15
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):   9.710 ns … 34.835 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.310 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.347 ns ±  0.534 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▅   █
  ▂▁▁▁▇▁▁▁▄▁▁▁▇▁▁▁▄▁▁▁▁█▁▁▁█▁▁▁▄▁▁▁█▁▁▁▇▁▁▁▁▄▁▁▁▅▁▁▁▄▁▁▁▃▁▁▁▂ ▃
  9.71 ns         Histogram: frequency by time        11.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.