Gradient of a gradient of a FastChain

ChrisRackauckas · January 1, 2022, 9:23pm

Okay yes, this is a gradient of a gradient example, not a gradient example. The gradient of a gradient of a FastChain won’t work because the FastChain adjoint uses mutation. It could be specialized to handle this case, but because this is almost never an efficient way to calculate the second derivative (forward-over-adjoint is just better in almost all respects) I’m not sure it’s a high priority to specialize this.

(And BTW, taking a gradient of gradient is a TensorFlow misnomer. It’s actually Jacobian of a gradient unless it’s a scalar function and thus the second derivative. Otherwise the sizes don’t align. TensorFlow silently makes gradient = sum of Jacobian as I describe here Gradient of Gradient in Zygote - #3 by ChrisRackauckas You should really double check whether that summation is the interpretation you wanted)

Topic		Replies	Views
How to use gradient of neural network as the loss function? Machine Learning question	13	2739	March 23, 2021
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022
Speeding up gradients for custom neural network - currently much slower than in PyTorch Machine Learning performance , differentiation	16	2120	August 28, 2021
Flux differentiation error Machine Learning zygote	19	1685	November 19, 2020
Nested and different AD methods altogether: How to add AD calculations inside my loss function when using neural differential equations? Machine Learning sciml , ad , neural-network , differentialequation	9	996	September 28, 2024

Gradient of a gradient of a FastChain

Related topics