Gradient of a gradient of a FastChain

mahdiar · January 2, 2022, 4:13pm

ChrisRackauckas:

Okay yes, this is a gradient of a gradient example, not a gradient example. The gradient of a gradient of a FastChain won’t work because the FastChain adjoint uses mutation. It could be specialized to handle this case, but because this is almost never an efficient way to calculate the second derivative (forward-over-adjoint is just better in almost all respects) I’m not sure it’s a high priority to specialize this.

(And BTW, taking a gradient of gradient is a TensorFlow misnomer. It’s actually Jacobian of a gradient unless it’s a scalar function and thus the second derivative. Otherwise the sizes don’t align. TensorFlow silently makes gradient = sum of Jacobian as I describe here Gradient of Gradient in Zygote - #3 by ChrisRackauckas You should really double check whether that summation is the interpretation you wanted)

Is this topic relevant? The model in their example is based on Chain instead of FastChaint.

Thanks for the additional information on TensorFlow. Maybe this is one of the reasons why people in PINNs use multiple neural networks for estimating each variable? Sum of Jacobian is clearly not the best approach.

Topic		Replies	Views
DiffEqFlux Autodifferentiating inside loss function Modelling & Simulations question , diffeq , sciml	6	595	September 29, 2020
Gradient of Gradient in Zygote General Usage	2	2712	January 1, 2021
Zygote gradients for functions with Array output General Usage	2	999	May 16, 2020
How to use gradient of neural network as the loss function? Machine Learning question	13	2713	March 23, 2021
Compute gradients in neuralODE with Zygote Machine Learning	3	240	August 24, 2023

Gradient of a gradient of a FastChain

Related topics