AD pipeline and Hessian–vector products

stevengj · June 27, 2024, 11:37pm

Nested AD (higher derivatives) like this can be a tricky subject, especially for large n where efficiency is important, but I think at this point there may be good solutions for the problem you are interested in.

If I understand your notation correctly, what you want is the following Hessian–vector product, which is equivalent to a directional derivative of \nabla f, and can be efficiently implemented by forward-over-reverse. i.e.

\underbrace{\frac{\partial^2 f}{\partial x^2}}_\mathrm{Hessian} v= \left. \frac{d}{d\alpha} \left( \left. \nabla f \right|_{x+\alpha v} \right) \right|_{\alpha = 0}

where (for large n) the \nabla f is implemented with reverse-mode (e.g. via Enzyme.jl or Zygote.jl) and the scalar derivative d/d\alpha is implemented with forward mode (e.g. ForwardDiff.jl). In this way, you avoid ever explicitly computing the Hessian matrix, and the computational cost should be proportional to the cost of computing f(x) only once.

See e.g. the discussion and example here of a closely related problem: Nested AD with Lux etc - #12 by stevengj

For example, this works fine for me:

using LinearAlgebra
f(x) = norm(x)^3 * x[end]  + x[1]^2 # example ℝⁿ→ℝ function

import Zygote, ForwardDiff

# compute (∂²f/∂x²)v at x
function Hₓ(x, v)
    ∇f(y) = Zygote.gradient(f, y)[1]
    return ForwardDiff.derivative(α -> ∇f(x + α*v), 0)
end

I couldn’t get the analogous thing to work with Enzyme, but maybe @wsmoses knows the trick.

Yeah, it seems like DifferentiationInterface.jl should include some kind of Hessian–vector product interface. It needs to be in a higher-level package like that one because it may involve combining multiple AD packages as I did above, and the implementation can be rather non-obvious (especially because not all AD combinations support nesting).

Topic		Replies	Views
Nested AD with Lux etc Machine Learning ad	26	1277	May 1, 2024
[blog post] Implement your own AD with Julia in ONE day Community blog-post	33	4237	November 3, 2018
State of automatic differentiation in Julia Machine Learning	57	21849	September 8, 2021
ChainRules: Replacing DiffRules in the Julia AD world Package Announcements package	43	4277	March 12, 2019
What lessons could Julia's autodiff ecosystem learn from Stan's TinyGrad? Machine Learning	41	3843	September 13, 2023

AD pipeline and Hessian–vector products

Related topics