Higher order derivatives in Flux

baggepinnen · May 21, 2018, 4:28pm

As I understand it, Flux.jl does not support the calculation of higher-order derivatives as of yet (please correct me if I’m wrong). Are there any plans to support this in the future? I opened an issue a while back Flux.jl/issues/129 but it hasn’t generated any discussion. My use cases are two:

Calculating Hessian-vector products see the issue for details
Including gradient terms in a cost function (a regularization application)

If HOD are not on the Flux horizon, I would very much appreciate suggestions on manual workarounds (Reverse/ForwardDiff etc.) that could integrate with an existing Flux-based workflow.

Thanks!

crinders · May 22, 2018, 12:28pm

If all you need is the Hessian, then ReverseDiff.hessian(f,x) should do what you want. If you want higher derivatives, repeated application of ReverseDiff.jacobian will do the trick. For instance,

f = x-> sum(exp.(x))^2
g = x->ReverseDiff.gradient(f,x)
h = x->ReverseDiff.jacobian(g,x)
ReverseDiff.jacbobian(h,[1.0,-3.0]

The last line will give you the third order derivative tensor of f for [1.0,-3.0] as a matrix, i.e., in order to get the tensor, you would have to change the shape of the output.

baggepinnen · May 22, 2018, 12:49pm

Thanks, those methods are useful. What I’m concerned about is how to integrate them with the training loop and optimizers written in Flux. I see no straightforward way of making use of ReverseDiff hessians with the Flux optimizers

crinders · May 22, 2018, 3:08pm

I’ll let the Flux.jl people answer, but worst-case, you will have to pass the ReverseDiff results to the tracked variable’s gradient data field, e.g.,

X.grad[:]=ReverseDiff.hessian(f,X.data)*X.data +ReverseDiff.gradient(f,X.data)

if your objective is (\nabla^T f) x. I don’t know whether that has any nasty side-effects.

Topic		Replies	Views
Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)? Machine Learning question , flux	3	1446	March 10, 2020
Calculating Third (and Higher) Derivatives: ReverseDiff gives 0s while ForwardDiff works General Usage forwarddiff , reversediff	1	282	July 3, 2023
Loss function with derivative of output New to Julia question , flux , machine-learning , reversediff , neural-network	3	1602	September 30, 2021
Is there an efficient way to compute the Hessian of a NN? Machine Learning flux	11	4648	November 12, 2019
Automatic Differentiation Machine Learning	11	3289	February 11, 2019

Higher order derivatives in Flux

Related topics