Custom rule for differentiating through Newton solver using ForwardDiff: works for gradient, fails for hessian

Hi all,

I have a function obj(p) that solves for the parameter t of a Bezier curve B(t) so that the arclength is equal to some arbitrary value (2 in this kind-of-minimal example). It uses the Newton’s method.
The Bezier curve is parametrized by 4 3D points, so that the input vector p has 12 elements.

using LinearAlgebra
using StaticArrays
using GeometryBasics
using ForwardDiff
import ForwardDiff.value
using FiniteDiff
using QuadGK

# Solves the equation f(x) = 0 for x using Newton's method
function newton(f, df, x₀, difftol=1e-12, abstol=1e-16, maxiter=100)
	x = x₀
	it = 0
	fx = f(x)
	Δ = typemax(x)
	while abs(fx) > abstol && abs(Δ) > difftol && it < maxiter
		Δ = -fx/df(x)
		x += Δ
		fx = f(x)
		it += 1
	end
	return it == maxiter ? error("Newton's method did not converge") : x
end



const Gx7, Wx7 = gauss(7) .|> SVector{7}

# Integrate a function f between a and b using 7 point Gauss quadrature
function integrate(f, a, b)
	x = (b-a)/2 * Gx7 .+ (b+a)/2
	w = (b-a)/2 * Wx7
	return mapreduce((xi, wi) -> f(xi) * wi, +, x, w)
end


# Computes the arclength of a curve f(t) between t₀ and t
arclength(df, t₀, t) = integrate(x -> norm(f(x)), t₀, t)
# The derivative of the arclength function wrt t
d_arclength_dt(df, t) = norm(f(t))


# Find the arclength parameter t so that the arclength of the Bezier curve B(t) between 0 and t is 2.
function obj(p)

	# Bezier curve control points
	P₁ = Point3(p[1], p[2], p[3])
	P₂ = Point3(p[4], p[5], p[6])
	P₃ = Point3(p[7], p[8], p[9])
	P₄ = Point3(p[10], p[11], p[12])

	# Bezier curve equation
	B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
	# Derivative wrt t
	dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄

	# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
	fun(x) = arclength(dB, 0., x) - 2.
	d_fun_dx(x) = d_arclength_dt(dB, x)

	# We can use Newton's method to solve fun(x) = 0
	xstar = newton(fun, d_fun_dx, 0.)

	return xstar

end

x0 = rand(12)
obj(x0)

I now want to compute the gradient and hessian of this function wrt the input vector p. This works fine using ForwardDiff, as the newton solver is differentiable, but much better performance can be achieved by defining a custom rule using the implicit function theorem, as described here: An overview of Roots · Roots

# Specialized version of obj(p) for ForwardDiff.Dual numbers using the implicit function theorem
function obj(p::AbstractVector{T}) where T <: ForwardDiff.Dual

	# control points
	P₁ = Point3(p[1], p[2], p[3])
	P₂ = Point3(p[4], p[5], p[6])
	P₃ = Point3(p[7], p[8], p[9])
	P₄ = Point3(p[10], p[11], p[12])

	# Bezier curve equation
	B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
	# Derivative wrt t
	dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄
	# Derivative wrt t - real valued version
	dB_real(t) = -3*(1-t)^2*value.(P₁) + 3*(1-t)^2*value.(P₂) - 6*(1-t)*t*value.(P₂) + 6*(1-t)*t*value.(P₃) - 3*t^2*value.(P₃) + 3*t^2*value.(P₄)

	# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
	fun(x) = arclength(dB_real, 0., x) - 2.
	d_fun_dx(x) = d_arclength_dt(dB_real, x)

	# Use the implicit function theorem to differentiate through the root finding process
	# See https://juliamath.github.io/Roots.jl/dev/roots/#Sensitivity
	xstar = newton(fun, d_fun_dx, 0.)
	∂f∂x = d_fun_dx(xstar)
	∂f∂p = arclength(dB, 0., xstar) - 2. # Evaluating fun with dB (which carries duals of the input vector p) instead of dB_real gives the derivative we are looking for
	xdual = -∂f∂p/∂f∂x

	return xdual

end

This also works quite well and can be checked using FiniteDiff:

ForwardDiff.gradient(obj, x0) - FiniteDiff.finite_difference_gradient(obj, x0) # should be approx. zero - OK

However something strange happens with the hessian: when using the default value of 12 for the chunk size, everything looks fine:

cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(12))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - OK!

But the result depends on the chunk size. For example, this fails:

cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(1))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - NOT OK!

Note that I didn’t define a custom rule for the hessian, so it goes through the second obj function. It could still be improved, but I don’t understand why the hessian is wrong when the chunk size is smaller than the length of the input vector.

Does anyone know what is wrong here?

Besides, if you could point me to an expression to write the custom rule for the hessian using the implicit function, that would be great! I could only find the expression for when p is a scalar.

Thanks!

1 Like

You may find GitHub - gdalle/ImplicitDifferentiation.jl: Automatic differentiation of implicit functions and GitHub - ThummeTo/ForwardDiffChainRules.jl useful.

1 Like

Thanks for the recommendations! I ended up manually coding the gradient and hessian for the function and using the ForwardDiffChainRules.jl package to enforce this rule, now it’s working.