As I understand it, Flux.jl does not support the calculation of higher-order derivatives as of yet (please correct me if I’m wrong). Are there any plans to support this in the future? I opened an issue a while back Flux.jl/issues/129 but it hasn’t generated any discussion. My use cases are two:
- Calculating Hessian-vector products see the issue for details
- Including gradient terms in a cost function (a regularization application)
If HOD are not on the Flux horizon, I would very much appreciate suggestions on manual workarounds (Reverse/ForwardDiff etc.) that could integrate with an existing Flux-based workflow.
Thanks!
2 Likes
If all you need is the Hessian, then ReverseDiff.hessian(f,x)
should do what you want. If you want higher derivatives, repeated application of ReverseDiff.jacobian
will do the trick. For instance,
f = x-> sum(exp.(x))^2
g = x->ReverseDiff.gradient(f,x)
h = x->ReverseDiff.jacobian(g,x)
ReverseDiff.jacbobian(h,[1.0,-3.0]
The last line will give you the third order derivative tensor of f
for [1.0,-3.0]
as a matrix, i.e., in order to get the tensor, you would have to change the shape of the output.
1 Like
Thanks, those methods are useful. What I’m concerned about is how to integrate them with the training loop and optimizers written in Flux. I see no straightforward way of making use of ReverseDiff hessians with the Flux optimizers
I’ll let the Flux.jl people answer, but worst-case, you will have to pass the ReverseDiff results to the tracked variable’s gradient data field, e.g.,
X.grad[:]=ReverseDiff.hessian(f,X.data)*X.data +ReverseDiff.gradient(f,X.data)
if your objective is (\nabla^T f) x. I don’t know whether that has any nasty side-effects.