Autodiff-ing a function defined by the result of Roots.find_zero fails

joaovgranja · September 24, 2022, 5:18pm

Hi all!

I am trying to ForwardDiff a function that is defined by solving a non-linear equation that depends on parameters. The code below reproduces the error. It first redefines exp by solving log(x) - a = 0

using Roots, ForwardDiff

f( x, a ) = log( x ) - a
∂f_x( x, a ) = ForwardDiff.derivative( t -> f( t, a ), x )

mexp( a ) = find_zero(
    (
        x -> f( x, a ),
        x -> ∂f_x( x, a )
    ),
    1, Roots.Newton()
)

# Test
mexp( log( π ) ) ≈ π # true

# Differentiate
dmexp(a) = ForwardDiff.derivative( mexp, a )
dmexp( 1 ) # error!

Full stacktrace:

ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Float64, 1})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:772
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  ...
Stacktrace:
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Float64, 1})
    @ Base ./number.jl:7
  [2] init_state(M::Roots.Newton, F::Roots.Callable_Function{Val{2}, Val{true}, Tuple{var"#11#13"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}, var"#12#14"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}}, Nothing}, x::Int64)
    @ Roots ~/.julia/packages/Roots/LYDH4/src/Derivative/newton.jl:63
  [3] init(𝑭𝑿::ZeroProblem{Tuple{var"#11#13"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}, var"#12#14"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}}, Int64}, M::Roots.Newton, p′::Nothing; p::Nothing, verbose::Bool, tracks::Roots.NullTracks, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Roots ~/.julia/packages/Roots/LYDH4/src/find_zero.jl:283
  [4] solve(𝑭𝑿::ZeroProblem{Tuple{var"#11#13"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}, var"#12#14"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}}, Int64}, M::Roots.Newton, p::Nothing; verbose::Bool, kwargs::Base.Pairs{Symbol, Union{Nothing, Roots.NullTracks}, Tuple{Symbol, Symbol}, NamedTuple{(:p, :tracks), Tuple{Nothing, Roots.NullTracks}}})
    @ Roots ~/.julia/packages/Roots/LYDH4/src/find_zero.jl:469
  [5] find_zero(f::Tuple{var"#11#13"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}, var"#12#14"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}}, x0::Int64, M::Roots.Newton; p::Nothing, verbose::Bool, tracks::Roots.NullTracks, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Roots ~/.julia/packages/Roots/LYDH4/src/find_zero.jl:215
  [6] find_zero(f::Tuple{var"#11#13"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}, var"#12#14"{ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1}}}, x0::Int64, M::Roots.Newton)
    @ Roots ~/.julia/packages/Roots/LYDH4/src/find_zero.jl:206
  [7] mexp(a::ForwardDiff.Dual{ForwardDiff.Tag{typeof(mexp), Int64}, Int64, 1})
    @ Main ~/Dropbox/Research/dynamic_games/hvb/tests.jl:21
  [8] derivative
    @ ~/.julia/packages/ForwardDiff/pDtsf/src/derivative.jl:14 [inlined]
  [9] dmexp(a::Int64)
    @ Main ~/Dropbox/Research/dynamic_games/hvb/tests.jl:31
 [10] top-level scope
    @ ~/Dropbox/Research/dynamic_games/hvb/tests.jl:32

Output of versioninfo:

Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 10 on 20 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 10

Roots v2.0.4 and ForwardDiff v0.10.32.

Many thanks for your help!

stevengj · September 24, 2022, 5:28pm

You really don’t want to do this, even if it works — you should implement the chain rule yourself using the implicit-function theorem, or use a package like ImplicitDifferentiation.jl. See also sections 3 and 5 of these notes. (Unless the Roots.jl package does this for you, which it does not currently: Roots.jl#325.)

The problem is that applying AD naively to a nonlinear solver will waste a lot of effort trying to propagate the derivatives through all of the intermediate steps, in order to exactly differentiate the error in the solution (from the finite number of steps). If you assume that the solution is converged to high precision, you can do much better by directly applying the chain rule to the equation being solved.

tcovert · September 24, 2022, 8:32pm

I recently had a similar problem - trying to ForwardDiff through a root finding step within a function that has a ton of other steps inside it, and ForwardDiff did fail (see here: confused by this root finding example · Issue #595 · JuliaDiff/ForwardDiff.jl · GitHub). Note that I was using Bisection instead of the default Order0 method, and instead of erroring out here, like in the above example, Bisection just silently fails and reports incorrect derivatives.

I am aware of the ImplicitDifferential package but as far as I can tell it only works with reverse mode approaches, which were not working in my setup for other reasons. More generally, I do know about the broader idea of computing derivatives myself using the chain rule + implicit differentiation, but given that the root finding step was but one of many steps in a much longer function that I needed a gradient of, I ended up giving up and just using finite differences, which I know is an even worse sin

For what its worth - the fact that ForwardDiff silently fails here, instead of explicitly failing, was pretty frustrating, and was something that took me forever to figure out, since I had the (apparently) mistaken impression that if ForwardDiff gives you a derivative, its giving you the right one.

Instead of making this response purely a rant, I will try to turn it into a productive question. Suppose a user like me is hoping to use ForwardDiff derivatives. Is there any (approximately) canned approach to doing what ImplicitDifferentiation is doing, but stay within the ForwardDiff universe? If I have a function f(x,theta) with many intermediate steps, in which step n of N is to perform a root finding operation, must I break f into an f1 which does the first n-1 steps, an f2 which does the root finding step, and an f3 which does the remaining steps? Should I have ForwardDiff create gradients for f1 and f3, manually work out the implicit gradient in f2`, and then manually combine those gradients to get at my complete gradient?

j_verzani · September 24, 2022, 10:44pm

Until we get to the issue @stevengj opened, you can make this work in the short term by using one(a) as the initial value instead of 1 in your mexp function.

j_verzani · September 24, 2022, 11:13pm

That Bisection error is odd. It arises from the number of steps taken by that algorithm and some interaction with ForwrdDiff. I’m unclear where that bug lurks. Anyways, until a better solution is implemented, avoiding Bisection is suggested (use the default A42 for bracketing).

joaovgranja · September 25, 2022, 10:31am

Interesting. Can you explain why?

Your comment generalizes, and I have no idea why. Can you shed some light on this?

f( x, a ) = log( x ) - a
∂f_x( x, a ) = ForwardDiff.derivative( t -> f( t, a ), x )

mexp( a, x0 ) = find_zero(
    (
        x -> f( x, a ),
        x -> ∂f_x( x, a )
    ),
    convert( eltype( a ), x0 ), # New
Roots.Newton()
)

# Test
mexp( log( π ), 1.0 ) ≈ π

# Differentiate
dmexp( a, x0 ) = ForwardDiff.derivative( a -> mexp( a, x0 ), a )
dmexp( 1.0, 1.0 )
dmexp( log( 17.0 ), 1.0 )

In light of @stevengj’s response, I will probably just implement these derivatives myself. But I’m curious as to why this version works.

ChrisRackauckas · September 25, 2022, 12:08pm

As an algorithm, Bisection is not differentiable because there is no continuous connection from an input to an output to collect the duals on. That is not the case for Falsi or Newton type methods. But of course, you don’t need to actually differentiate the algorithm so it’s a moot point.

joaovgranja · September 25, 2022, 2:29pm

I’m not sure I understand your point here. Sure, you don’t have to differentiate the algorithm (which afaiu is what ForwardDiff would be doing in this example), but perhaps you can. [The ability to do so comes up as an example of the power of autodiff here (also available as notes somewhere)]. If this feature is available, and because programmer time is a very scarce resource, we may want to exploit it if the result is sufficiently efficient for one’s purpose.

ChrisRackauckas · September 25, 2022, 3:59pm

No you cannot. There is no arithmetic connection between the input and the answer when you use bisection. Think about it. What is the derivative of the root with respect to the brackets? Zero because as long as the root is in there you get the same solution. So now run the algorithm. Is it in the first half or the second? Midpoint is calculated using the average of the top and bottom, so duals of the brackets are transferred and scaled to the midpoint. Keep doing it. Therefore the duals are the derivatives of the brackets. Therefore when it tells you that they are zero it is correct: it correctly computed that the solution is constant with respect to the brackets.

It requires a manual fix using the implicit differentiation rule to handle, and a definition of what are parameters to f, which is precisely why the SciML tools have such explicit parameterization (see NonlinearSolve.jl)

joaovgranja · September 25, 2022, 4:16pm

I’m not using bisection. When you said

I understood you were referring to Newton-like methods. In that case you can differentiate the algorithm. As you say, you don’t have to, but it does save programmer time if one can just ForwardDiff (or an alternative autodiff) a function as the one defined above. So the point is not moot.

I understood your comment as “there are other (perhaps theoretically sounder) tools for the job, so there’s no need to talk about this particular tool”. I think this particular tool is still useful. Apologies if I misunderstood you.

ChrisRackauckas · September 28, 2022, 11:43am

For some reason I thought you were calling Bisection.

Yes, Newton-like methods are autodiff-able. Just use NonlinearSolve.jl whose implementation is differentiable, and it has overloads (and it outperforms Roots.jl, https://twitter.com/ChrisRackauckas/status/1544743542094020615)

j_verzani · September 28, 2022, 8:50pm

To close out this conversation from the Roots end, the newest version, closing the issue opened based on this thread, provides a different solution:

With ForwardDiff it is important – due to how types propagate – to start with a type matching the parameter:

using Roots, ForwardDiff, Test

f( x, a ) = log( x ) - a
fₓ( x, a ) = ForwardDiff.derivative( t -> f( t, a ), x )

mexp(a) = find_zero((f, fₓ), 1, Roots.Newton(), a)  # x0 = 1 -- not one(a)!
@test mexp(log(π)) ≈ π
@test_throws MethodError ForwardDiff.derivative(mexp, 1)

mexp_(a) = find_zero((f, fₓ), one(a), Roots.Newton(), a)
@test ForwardDiff.derivative(mexp_, 1) ≈ exp(1)

# using the closure style you started with
mexp1(a) = find_zero((x -> f(x,a), x->fₓ(x,a)), one(a), Roots.Newton())
@test ForwardDiff.derivative(mexp1, 1) ≈ exp(1)

The adjoint method, mentioned by @stevengj, can avoid the extra effort of pushing the AD machinery through the algorithm:

a = log(π)
xᵅ = find_zero((f,fₓ), 1, Roots.Newton(), a)
fx = ForwardDiff.derivative(x -> f(x, a), xᵅ)
fp = ForwardDiff.derivative(a -> f(xᵅ, a), a) # gradient if `a` is a vector
@test - fp / fx ≈ exp(a)

This led to a new frule and rrule in Roots so that AD programs should be able to differentiate directly (with the parameter passed in to find_zero, not within a closure):

using Zygote # uses rrule from Roots
@test first(Zygote.gradient(mexp, 1)) ≈ exp(1)

joaovgranja · September 29, 2022, 9:54am

This is great. Others have provided very useful input, but since this addresses the exact original question, I feel like this should be accepted as a solution.

Many thanks for your work on this!

Topic		Replies	Views
How to autodifferentiate the results of NLsolve? General Usage autodiff	24	2482	March 8, 2023
Custom rule for differentiating through Newton solver using ForwardDiff: works for gradient, fails for hessian Performance roots , forwarddiff	23	998	May 26, 2023
A Unified Interface for Rootfinding Numerics package	22	5496	December 24, 2022
Automatic Differentiation with custom functions within likelihood function General Usage question , package	6	1299	November 29, 2019
Help making ForwardDiff.derivative work Optimization (Mathematical) forwarddiff , autodiff	2	46	January 24, 2025

Autodiff-ing a function defined by the result of Roots.find_zero fails

Related topics