Hi all,
I have a function obj(p)
that solves for the parameter t of a Bezier curve B(t) so that the arclength is equal to some arbitrary value (2 in this kind-of-minimal example). It uses the Newton’s method.
The Bezier curve is parametrized by 4 3D points, so that the input vector p has 12 elements.
using LinearAlgebra
using StaticArrays
using GeometryBasics
using ForwardDiff
import ForwardDiff.value
using FiniteDiff
using QuadGK
# Solves the equation f(x) = 0 for x using Newton's method
function newton(f, df, x₀, difftol=1e-12, abstol=1e-16, maxiter=100)
x = x₀
it = 0
fx = f(x)
Δ = typemax(x)
while abs(fx) > abstol && abs(Δ) > difftol && it < maxiter
Δ = -fx/df(x)
x += Δ
fx = f(x)
it += 1
end
return it == maxiter ? error("Newton's method did not converge") : x
end
const Gx7, Wx7 = gauss(7) .|> SVector{7}
# Integrate a function f between a and b using 7 point Gauss quadrature
function integrate(f, a, b)
x = (b-a)/2 * Gx7 .+ (b+a)/2
w = (b-a)/2 * Wx7
return mapreduce((xi, wi) -> f(xi) * wi, +, x, w)
end
# Computes the arclength of a curve f(t) between t₀ and t
arclength(df, t₀, t) = integrate(x -> norm(f(x)), t₀, t)
# The derivative of the arclength function wrt t
d_arclength_dt(df, t) = norm(f(t))
# Find the arclength parameter t so that the arclength of the Bezier curve B(t) between 0 and t is 2.
function obj(p)
# Bezier curve control points
P₁ = Point3(p[1], p[2], p[3])
P₂ = Point3(p[4], p[5], p[6])
P₃ = Point3(p[7], p[8], p[9])
P₄ = Point3(p[10], p[11], p[12])
# Bezier curve equation
B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
# Derivative wrt t
dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄
# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
fun(x) = arclength(dB, 0., x) - 2.
d_fun_dx(x) = d_arclength_dt(dB, x)
# We can use Newton's method to solve fun(x) = 0
xstar = newton(fun, d_fun_dx, 0.)
return xstar
end
x0 = rand(12)
obj(x0)
I now want to compute the gradient and hessian of this function wrt the input vector p. This works fine using ForwardDiff
, as the newton solver is differentiable, but much better performance can be achieved by defining a custom rule using the implicit function theorem, as described here: An overview of Roots · Roots
# Specialized version of obj(p) for ForwardDiff.Dual numbers using the implicit function theorem
function obj(p::AbstractVector{T}) where T <: ForwardDiff.Dual
# control points
P₁ = Point3(p[1], p[2], p[3])
P₂ = Point3(p[4], p[5], p[6])
P₃ = Point3(p[7], p[8], p[9])
P₄ = Point3(p[10], p[11], p[12])
# Bezier curve equation
B(t) = (1-t)^3*P₁ + 3*(1-t)^2*t*P₂ + 3*(1-t)*t^2*P₃ + t^3*P₄
# Derivative wrt t
dB(t) = -3*(1-t)^2*P₁ + 3*(1-t)^2*P₂ - 6*(1-t)*t*P₂ + 6*(1-t)*t*P₃ - 3*t^2*P₃ + 3*t^2*P₄
# Derivative wrt t - real valued version
dB_real(t) = -3*(1-t)^2*value.(P₁) + 3*(1-t)^2*value.(P₂) - 6*(1-t)*t*value.(P₂) + 6*(1-t)*t*value.(P₃) - 3*t^2*value.(P₃) + 3*t^2*value.(P₄)
# In order to find the parameter t that solves arclength = 2., we need to solve fun(x) = 0 with:
fun(x) = arclength(dB_real, 0., x) - 2.
d_fun_dx(x) = d_arclength_dt(dB_real, x)
# Use the implicit function theorem to differentiate through the root finding process
# See https://juliamath.github.io/Roots.jl/dev/roots/#Sensitivity
xstar = newton(fun, d_fun_dx, 0.)
∂f∂x = d_fun_dx(xstar)
∂f∂p = arclength(dB, 0., xstar) - 2. # Evaluating fun with dB (which carries duals of the input vector p) instead of dB_real gives the derivative we are looking for
xdual = -∂f∂p/∂f∂x
return xdual
end
This also works quite well and can be checked using FiniteDiff
:
ForwardDiff.gradient(obj, x0) - FiniteDiff.finite_difference_gradient(obj, x0) # should be approx. zero - OK
However something strange happens with the hessian: when using the default value of 12 for the chunk size, everything looks fine:
cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(12))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - OK!
But the result depends on the chunk size. For example, this fails:
cfg = ForwardDiff.HessianConfig(obj, x0, ForwardDiff.Chunk(1))
ForwardDiff.hessian(obj, x0, cfg) - FiniteDiff.finite_difference_hessian(obj, x0) # should be approx. zero - NOT OK!
Note that I didn’t define a custom rule for the hessian, so it goes through the second obj
function. It could still be improved, but I don’t understand why the hessian is wrong when the chunk size is smaller than the length of the input vector.
Does anyone know what is wrong here?
Besides, if you could point me to an expression to write the custom rule for the hessian using the implicit function, that would be great! I could only find the expression for when p is a scalar.
Thanks!