# Custom rule for differentiating through Newton solver using ForwardDiff: works for gradient, fails for hessian

Hi all,

I have a function `obj(p)` that solves for the parameter t of a Bezier curve B(t) so that the arclength is equal to some arbitrary value (2 in this kind-of-minimal example). It uses the Newton’s method.
The Bezier curve is parametrized by 4 3D points, so that the input vector p has 12 elements.

``````using LinearAlgebra
using StaticArrays
using GeometryBasics
using ForwardDiff
import ForwardDiff.value
using FiniteDiff

# Solves the equation f(x) = 0 for x using Newton's method
function newton(f, df, x₀, difftol=1e-12, abstol=1e-16, maxiter=100)
x = x₀
it = 0
fx = f(x)
Δ = typemax(x)
while abs(fx) > abstol && abs(Δ) > difftol && it < maxiter
Δ = -fx/df(x)
x += Δ
fx = f(x)
it += 1
end
return it == maxiter ? error("Newton's method did not converge") : x
end

const Gx7, Wx7 = gauss(7) .|> SVector{7}

# Integrate a function f between a and b using 7 point Gauss quadrature
function integrate(f, a, b)
x = (b-a)/2 * Gx7 .+ (b+a)/2
w = (b-a)/2 * Wx7
return mapreduce((xi, wi) -> f(xi) * wi, +, x, w)
end

# Computes the arclength of a curve f(t) between t₀ and t
arclength(df, t₀, t) = integrate(x -> norm(f(x)), t₀, t)
# The derivative of the arclength function wrt t
d_arclength_dt(df, t) = norm(f(t))

# Find the arclength parameter t so that the arclength of the Bezier curve B(t) between 0 and t is 2.
function obj(p)

# Bezier curve control points
P₁ = Point3(p, p, p)
P₂ = Point3(p, p, p)
P₃ = Point3(p, p, p)
P₄ = Point3(p, p, p)

# Bezier curve equation
B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
# Derivative wrt t
dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄

# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
fun(x) = arclength(dB, 0., x) - 2.
d_fun_dx(x) = d_arclength_dt(dB, x)

# We can use Newton's method to solve fun(x) = 0
xstar = newton(fun, d_fun_dx, 0.)

return xstar

end

x0 = rand(12)
obj(x0)
``````

I now want to compute the gradient and hessian of this function wrt the input vector p. This works fine using `ForwardDiff`, as the newton solver is differentiable, but much better performance can be achieved by defining a custom rule using the implicit function theorem, as described here: An overview of Roots · Roots

``````# Specialized version of obj(p) for ForwardDiff.Dual numbers using the implicit function theorem
function obj(p::AbstractVector{T}) where T <: ForwardDiff.Dual

# control points
P₁ = Point3(p, p, p)
P₂ = Point3(p, p, p)
P₃ = Point3(p, p, p)
P₄ = Point3(p, p, p)

# Bezier curve equation
B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
# Derivative wrt t
dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄
# Derivative wrt t - real valued version
dB_real(t) = -3*(1-t)^2*value.(P₁) + 3*(1-t)^2*value.(P₂) - 6*(1-t)*t*value.(P₂) + 6*(1-t)*t*value.(P₃) - 3*t^2*value.(P₃) + 3*t^2*value.(P₄)

# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
fun(x) = arclength(dB_real, 0., x) - 2.
d_fun_dx(x) = d_arclength_dt(dB_real, x)

# Use the implicit function theorem to differentiate through the root finding process
# See https://juliamath.github.io/Roots.jl/dev/roots/#Sensitivity
xstar = newton(fun, d_fun_dx, 0.)
∂f∂x = d_fun_dx(xstar)
∂f∂p = arclength(dB, 0., xstar) - 2. # Evaluating fun with dB (which carries duals of the input vector p) instead of dB_real gives the derivative we are looking for
xdual = -∂f∂p/∂f∂x

return xdual

end
``````

This also works quite well and can be checked using `FiniteDiff`:

``````ForwardDiff.gradient(obj, x0) - FiniteDiff.finite_difference_gradient(obj, x0) # should be approx. zero - OK
``````

However something strange happens with the hessian: when using the default value of 12 for the chunk size, everything looks fine:

``````cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(12))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - OK!
``````

But the result depends on the chunk size. For example, this fails:

``````cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(1))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - NOT OK!
``````

Note that I didn’t define a custom rule for the hessian, so it goes through the second `obj` function. It could still be improved, but I don’t understand why the hessian is wrong when the chunk size is smaller than the length of the input vector.

Does anyone know what is wrong here?

Besides, if you could point me to an expression to write the custom rule for the hessian using the implicit function, that would be great! I could only find the expression for when p is a scalar.

Thanks!

1 Like
1 Like

Thanks for the recommendations! I ended up manually coding the gradient and hessian for the function and using the `ForwardDiffChainRules.jl` package to enforce this rule, now it’s working.