Testing and tolerances for implicitly calculated functions and their AD

To fix ideas, suppose we have a function x = g(p) defined by f(x(p), p) = 0, both arguments are scalars.

The numerical implementation involves finding an x such that | f(x, p)| \le \mathrm{tol}, either by Newton’s method or Brent, depending on circumstances.

Once x is found, the derivative x'(p) = - (\partial f/\partial p)/(\partial f/\partial x) can be readily calculated and incorporated into an AD rule, which I will want to unit test.

I am running into two issues:

  1. most test frameworks use the excellent FiniteDifferences.jl, so one can pass on a finite difference method for checking the derivative. However, I am unsure how to parametrize it, there is a factor = ... argument but it refers to relative noise. But even the absolute, let alone the relative error of these calculations depends on the p and the tolerance. Is there anything else I should be doing?

  2. When it comes to unit testing, it is unclear how to choose tolerances. In practice I found that some tests fail if I don’t choose high tolerances (think rtol = 0.05 etc). But then I am worried that it’s just my code having errors that pass silently.

Easiest thing to do, which is what NonlinearSolve.jl does for its testing here, is to just test ForwardDiff vs Forward-Mode rule vs Reverse-mode rule, so one diffs the algorithm and the other two check the rules.

FiniteDifferences.jl is actually a really bad idea for this kind of thing because of numerical instabilities so it shouldn’t be used as a way to check rules but :person_shrugging: I brought it up before. At least for the simple cases in ChainRules.jl that doesn’t really matter but for most things you’d see in SciML it’s just not even a convergent way to do things considering the numerical error so yeah, highly do not suggest that people ever use that package in practice. Either FiniteDiff.jl as a fallback because it basically is just the caching AD interfaces with the simplest finite difference methods or nothing, but higher order FiniteDifferences.jl is a huge can of worms in most real cases you’d want to test.

This can be difficult because for implicit problems it can depend on the numerical solving technique used and the condition number of the Jacobian as to how close to convergence you get. There is a pullback error in the rule that is proportional to both of these factors (I need to write down that somewhere :sweat_smile:) and so yes, you can have cases where differentiating the algorithm vs the implicit function rule differ by 1e-3 even though the solver tolerance is 1e-12 (and once you see the proofs of this error it’s not hard to construct such edge cases). So the answer is, it is a “it depends”