Autodiff: calculate just the diagonal of the Hessian

ElOceanografo · April 7, 2019, 11:17pm

Given a function f(x) from R^2 → R, is there an efficient way to calculate just the diagonal entries of the Hessian matrix using any of the autodiff tools? I know that at the function’s max, all the off-diagonal partials are zero, e.g.

using ForwardDiff
f(x) = dot(x, x)
ForwardDiff.hessian(f, [0.0, 0.0])

# 2×2 Array{Float64,2}:
# 2.0  0.0
# 0.0  2.0

In reality, the function is more complicated and the dimension of x might be much larger, such that computing the whole Hessian might use up all my memory. Hacky solutions are fine for my current purposes. Thanks in advance!

longemen3000 · April 8, 2019, 1:05am

this is definitively hacky, what this code does is generate n simple derivative functions in a vector, and derivates those functions:

using ForwardDiff,SparseArrays 
#test function, dfdx[i] = 2ix, d2fdx[i]2 = 2i
ff(x) = LinearAlgebra.dot(x.^2,1:length(x))

function generate_gradient(ff,xx)
    z=Array{Any,1}(undef,length(x))
    for i = 1:length(x)
    zz = xx
    a =y-> f(vcat(xx[1:(i-1)],y,xx[(i+1):end]))
    z[i] = w ->ForwardDiff.derivative(a,w)
    end
return z
end

function diag_hessian(z_gradient,x)
    res = zeros(size(x))
    for i = 1:length(x)
    res[i] =ForwardDiff.derivative(z_gradient[i],x[i])
    end
return res
end

x = ones(10)
gradx = generate_gradient(ff,x)
diaghesx=diag_hessian(gradx,x)

longemen3000 · April 8, 2019, 1:08am

Update, i merged the two in one convenience function:

ff1(x) = LinearAlgebra.dot(x.^2,1:length(x)) #dfdx[i] = 2ix, d2fdx[i]2 = 2i
ff2(x) = sum(x.^3) #dfdx[i] = 3x[i]^2,  d2fdx[i]2 = 6x[i]

function diag_hessian(f,xx)
    z=Array{eltype(xx),1}(undef,length(xx))
    for i = 1:length(xx)
    a0 =w-> f(vcat(xx[1:(i-1)],w,xx[(i+1):end])) #this maybe can be optimized
    a1 = w ->ForwardDiff.derivative(a0,w)
    z[i] = ForwardDiff.derivative(a1,xx[i])
    return z
end


y1 = diag_hessian(ff1,x1)
y2 = diag_hessian(ff2,x1)
end

edit, changed type Any to type eltype(x)

longemen3000 · April 8, 2019, 1:18am

at a size of 1000, i have the following results:

#ForwardDiff.Hessian
  10.669 s (21351 allocations: 8.90 GiB)
#diag_hessian
  44.124 ms (81630 allocations: 64.18 MiB)

you can optimize this further, using some multihreading, as all calculations are independent. in my computer, this code is slower than ForwardDIff.hessian for sizes smaller than 100 elements, but always consumes less memory than the hessian counterpart. I don’t know how to reduce the allocations, maybe using the internals of ForwardDiff can help.

If you are calculating a big problem, i don’t think that a Forward-mode AD is the adecuate mode, as escalates linearly with the number of problems. maybe a reverse-mode AD package can help you more, but i have no knowledge of if its possible to create a diagonal hessian with those methods.

ElOceanografo · April 8, 2019, 4:30am

Awesome, thank you–that’s good enough for my purposes now, I’m just working through this paper and coding parts of it up to make sure I understand it. NB, you’ve got a misplaced end (should be right after the return z).

I’d been trying something similar to your answer, but setting each element of xx to w in-place, then setting it back to its original value before moving to the next element. Which avoids the allocations from vcat, but causes ForwardDiff to choke…I think basically the same issue as here. So it probably can be optimized, but you know what they say about premature optimization!

longemen3000 · April 8, 2019, 4:59am

I tried that too ajjaaj, i was playing with derivatives for thermodynamic equations and i faced similar problems.

antoine-levitt · April 8, 2019, 6:36am

If you know H is diagonal, then H e gets you what you need, where e is the vector with just ones. So in theory the optimal method is to do forward diff to compute the derivative of x->sum(grad_f(x)), where grad_f is computed by reverse diff. If you’re in dimension 2 however just do nested forward diff to compute both diagonal elements.

Topic		Replies	Views
How to compute the last column of Hessian (over and over) with autodiff without computing entire Hessian? General Usage forwarddiff	5	166	September 7, 2023
How to compute Hessian sub-blocks with autodiff? Performance autodiff	7	242	September 4, 2023
Help with derivatives of matrix functions Machine Learning question	8	1117	February 25, 2022
Automatic differentiation - ForwardDiff: components of gradient General Usage question , autodiff	1	729	September 21, 2021
Large memory allocation in ForwardDiff Optimization (Mathematical)	14	1722	July 7, 2020

Autodiff: calculate just the diagonal of the Hessian

Related topics