Higher order derivatives in Flux

As I understand it, Flux.jl does not support the calculation of higher-order derivatives as of yet (please correct me if I’m wrong). Are there any plans to support this in the future? I opened an issue a while back Flux.jl/issues/129 but it hasn’t generated any discussion. My use cases are two:

  • Calculating Hessian-vector products see the issue for details
  • Including gradient terms in a cost function (a regularization application)

If HOD are not on the Flux horizon, I would very much appreciate suggestions on manual workarounds (Reverse/ForwardDiff etc.) that could integrate with an existing Flux-based workflow.

Thanks!

2 Likes

If all you need is the Hessian, then ReverseDiff.hessian(f,x) should do what you want. If you want higher derivatives, repeated application of ReverseDiff.jacobian will do the trick. For instance,

f = x-> sum(exp.(x))^2
g = x->ReverseDiff.gradient(f,x)
h = x->ReverseDiff.jacobian(g,x)
ReverseDiff.jacbobian(h,[1.0,-3.0]

The last line will give you the third order derivative tensor of f for [1.0,-3.0] as a matrix, i.e., in order to get the tensor, you would have to change the shape of the output.

1 Like

Thanks, those methods are useful. What I’m concerned about is how to integrate them with the training loop and optimizers written in Flux. I see no straightforward way of making use of ReverseDiff hessians with the Flux optimizers :confused:

I’ll let the Flux.jl people answer, but worst-case, you will have to pass the ReverseDiff results to the tracked variable’s gradient data field, e.g.,

X.grad[:]=ReverseDiff.hessian(f,X.data)*X.data +ReverseDiff.gradient(f,X.data)

if your objective is (\nabla^T f) x. I don’t know whether that has any nasty side-effects.