Higher order derivatives in Flux

As I understand it, Flux.jl does not support the calculation of higher-order derivatives as of yet (please correct me if I’m wrong). Are there any plans to support this in the future? I opened an issue a while back Flux.jl/issues/129 but it hasn’t generated any discussion. My use cases are two:

  • Calculating Hessian-vector products see the issue for details
  • Including gradient terms in a cost function (a regularization application)

If HOD are not on the Flux horizon, I would very much appreciate suggestions on manual workarounds (Reverse/ForwardDiff etc.) that could integrate with an existing Flux-based workflow.

Thanks!

2 Likes

If all you need is the Hessian, then ReverseDiff.hessian(f,x) should do what you want. If you want higher derivatives, repeated application of ReverseDiff.jacobian will do the trick. For instance,

f = x-> sum(exp.(x))^2
g = x->ReverseDiff.gradient(f,x)
h = x->ReverseDiff.jacobian(g,x)
ReverseDiff.jacbobian(h,[1.0,-3.0]

The last line will give you the third order derivative tensor of f for [1.0,-3.0] as a matrix, i.e., in order to get the tensor, you would have to change the shape of the output.

Thanks, those methods are useful. What I’m concerned about is how to integrate them with the training loop and optimizers written in Flux. I see no straightforward way of making use of ReverseDiff hessians with the Flux optimizers :confused:

I’ll let the Flux.jl people answer, but worst-case, you will have to pass the ReverseDiff results to the tracked variable’s gradient data field, e.g.,

X.grad[:]=ReverseDiff.hessian(f,X.data)*X.data +ReverseDiff.gradient(f,X.data)

if your objective is (\nabla^T f) x. I don’t know whether that has any nasty side-effects.