Is there an efficient way to compute the Hessian of a NN?

HenriDeh · August 2, 2019, 2:44pm

Thank you both for your answers. And thank you for this nice paper, I understand AD much better now. I wanted to attempt both RoF and double forward methods. But I encounter a stack overflow issue when I use ForwardDiff methods on a neural network. Only the jacobian method takes an AbstractArray as input and accepts AbstractArrays as output. The other assume one of the two is Real. However:

f = Chain(Dense(10,5,elu),Dense(5,3))
x = rand(10)
ForwardDiff.jacobian(f, x)

Throws a stack overflow. I assume this method does n forward passes, one for each input. But the underlying ForwardDiff.gradient cannot take multi-dimensional output functions as argument. Isn’t it weird for a forward accumulation mode to not be able to do that ? According to the paper from Baydin et al. it is quite straightforward to do. Same for the ForwardDiff.derivative function, it only handle functions with Real inputs.

Am I missing something ?

Topic		Replies	Views
Efficiently computing Hessians of Neural Networks output with respect to inputs Performance question , flux , pde , hessian	1	261	July 16, 2023
Hessian matrix of a neural network vetor output Specific Domains physics , zygote , neural-network	3	441	July 16, 2022
Autograd in Flux General Usage	14	1287	March 5, 2021
How to compute hessian Machine Learning	1	866	August 26, 2019
Batched gradients and hessians with Flux Machine Learning question	10	204	January 25, 2025

Is there an efficient way to compute the Hessian of a NN?

Related topics