Autodiff: calculate just the diagonal of the Hessian

question
#1

Given a function f(x) from R^2 -> R, is there an efficient way to calculate just the diagonal entries of the Hessian matrix using any of the autodiff tools? I know that at the function’s max, all the off-diagonal partials are zero, e.g.

using ForwardDiff
f(x) = dot(x, x)
ForwardDiff.hessian(f, [0.0, 0.0])

# 2×2 Array{Float64,2}:
# 2.0  0.0
# 0.0  2.0

In reality, the function is more complicated and the dimension of x might be much larger, such that computing the whole Hessian might use up all my memory. Hacky solutions are fine for my current purposes. Thanks in advance!

1 Like

#2

this is definitively hacky, what this code does is generate n simple derivative functions in a vector, and derivates those functions:

using ForwardDiff,SparseArrays 
#test function, dfdx[i] = 2ix, d2fdx[i]2 = 2i
ff(x) = LinearAlgebra.dot(x.^2,1:length(x))

function generate_gradient(ff,xx)
    z=Array{Any,1}(undef,length(x))
    for i = 1:length(x)
    zz = xx
    a =y-> f(vcat(xx[1:(i-1)],y,xx[(i+1):end]))
    z[i] = w ->ForwardDiff.derivative(a,w)
    end
return z
end

function diag_hessian(z_gradient,x)
    res = zeros(size(x))
    for i = 1:length(x)
    res[i] =ForwardDiff.derivative(z_gradient[i],x[i])
    end
return res
end

x = ones(10)
gradx = generate_gradient(ff,x)
diaghesx=diag_hessian(gradx,x)


0 Likes

#3

Update, i merged the two in one convenience function:

ff1(x) = LinearAlgebra.dot(x.^2,1:length(x)) #dfdx[i] = 2ix, d2fdx[i]2 = 2i
ff2(x) = sum(x.^3) #dfdx[i] = 3x[i]^2,  d2fdx[i]2 = 6x[i]

function diag_hessian(f,xx)
    z=Array{eltype(xx),1}(undef,length(xx))
    for i = 1:length(xx)
    a0 =w-> f(vcat(xx[1:(i-1)],w,xx[(i+1):end])) #this maybe can be optimized
    a1 = w ->ForwardDiff.derivative(a0,w)
    z[i] = ForwardDiff.derivative(a1,xx[i])
    return z
end


y1 = diag_hessian(ff1,x1)
y2 = diag_hessian(ff2,x1)
end


edit, changed type Any to type eltype(x)

1 Like

#4

at a size of 1000, i have the following results:

#ForwardDiff.Hessian
  10.669 s (21351 allocations: 8.90 GiB)
#diag_hessian
  44.124 ms (81630 allocations: 64.18 MiB)

you can optimize this further, using some multihreading, as all calculations are independent. in my computer, this code is slower than ForwardDIff.hessian for sizes smaller than 100 elements, but always consumes less memory than the hessian counterpart. I don’t know how to reduce the allocations, maybe using the internals of ForwardDiff can help.

If you are calculating a big problem, i don’t think that a Forward-mode AD is the adecuate mode, as escalates linearly with the number of problems. maybe a reverse-mode AD package can help you more, but i have no knowledge of if its possible to create a diagonal hessian with those methods.

1 Like

#5

Awesome, thank you–that’s good enough for my purposes now, I’m just working through this paper and coding parts of it up to make sure I understand it. NB, you’ve got a misplaced end (should be right after the return z).

I’d been trying something similar to your answer, but setting each element of xx to w in-place, then setting it back to its original value before moving to the next element. Which avoids the allocations from vcat, but causes ForwardDiff to choke…I think basically the same issue as here. So it probably can be optimized, but you know what they say about premature optimization!

1 Like

#6

I tried that too ajjaaj, i was playing with derivatives for thermodynamic equations and i faced similar problems.

0 Likes

#7

If you know H is diagonal, then H e gets you what you need, where e is the vector with just ones. So in theory the optimal method is to do forward diff to compute the derivative of x->sum(grad_f(x)), where grad_f is computed by reverse diff. If you’re in dimension 2 however just do nested forward diff to compute both diagonal elements.

2 Likes