AD pipeline and Hessian–vector products

Nested AD (higher derivatives) like this can be a tricky subject, especially for large n where efficiency is important, but I think at this point there may be good solutions for the problem you are interested in.

If I understand your notation correctly, what you want is the following Hessian–vector product, which is equivalent to a directional derivative of \nabla f, and can be efficiently implemented by forward-over-reverse. i.e.

\underbrace{\frac{\partial^2 f}{\partial x^2}}_\mathrm{Hessian} v= \left. \frac{d}{d\alpha} \left( \left. \nabla f \right|_{x+\alpha v} \right) \right|_{\alpha = 0}

where (for large n) the \nabla f is implemented with reverse-mode (e.g. via Enzyme.jl or Zygote.jl) and the scalar derivative d/d\alpha is implemented with forward mode (e.g. ForwardDiff.jl). In this way, you avoid ever explicitly computing the Hessian matrix, and the computational cost should be proportional to the cost of computing f(x) only once.

See e.g. the discussion and example here of a closely related problem: Nested AD with Lux etc - #12 by stevengj

For example, this works fine for me:

using LinearAlgebra
f(x) = norm(x)^3 * x[end]  + x[1]^2 # example ℝⁿ→ℝ function

import Zygote, ForwardDiff

# compute (∂²f/∂x²)v at x
function Hₓ(x, v)
    ∇f(y) = Zygote.gradient(f, y)[1]
    return ForwardDiff.derivative(α -> ∇f(x + α*v), 0)
end

I couldn’t get the analogous thing to work with Enzyme, but maybe @wsmoses knows the trick.

Yeah, it seems like DifferentiationInterface.jl should include some kind of Hessian–vector product interface. It needs to be in a higher-level package like that one because it may involve combining multiple AD packages as I did above, and the implementation can be rather non-obvious (especially because not all AD combinations support nesting).

1 Like