How to force Flux to use FiniteDiff

If you are doing a bilevel optimization (optimizing a function that itself solves an optimization problem), you can declare your own rrule (vector–Jacobian product) to tell Zygote how to differentiate it efficiently using the implicit-function theorem. (Basically, you differentiate using the KKT conditions describing your inner optimum.)

In general, AD tools need a bit of “help” whenever the function you are differentiating solves a problem approximately by an iterative method (e.g. Newton iterations for root finding, or iterative optimization algorithms, or adaptive quadrature) — even if AD can analyze the iterations, it will end up wasting a lot of effort trying to exactly differentiate the error in your approximation.

See also Differentiating optimization problem solutions in Julia

3 Likes