My package DifferentiationInterface.jl provides the necessary infrastructure for second-order automatic differentiation (e.g. Hessians). It allows you to combine arbitrary pairs of AD backends, but of course many of the combinations will fail. My question is therefore: which pairs of backends should I aim to support and test?
This was prompted by issue #278 of DI, which asks for second-order Enzyme support, but we can try to cover more ground! Judging by the list in the README, there are 13 available AD packages, which translates to roughly 14 different backends (counting forward and reverse Enzyme). I donāt want to test \binom{14}{2} combinations, so I thought of some ways to reduce the list:
symbolic backends (Symbolics.jl and FastDifferentiation.jl) will rarely be paired with something else, we can discard them
finite differences backends (FiniteDiff.jl and FiniteDifferences.jl) should pair well with most other backends so Iām less curious about the results of testing
experimental backends (Diffractor.jl and Tapir.jl) are low-priority
there are some near-duplicates in the list (I anticipate bikeshedding on this):
PolyesterForwardDiff.jl is well-tested if ForwardDiff.jl is
ChainRulesCore.jl is well-tested if Zygote.jl is
Hereās the table of combinations that are currently part of the test suite for Hessians, I will add your suggestions to it as the discussion progresses (if they make sense). The table should be read as āouter (row) over inner (column)ā.
I added this trick in the following PR, but I wonder if it is possible to learn a lesson from it.
The underlying issue is that for Enzyme, you have to do something differently in order to enable higher-order differentiation (use autodiff_deferred instead of autodiff).
Do we lose something if we use autodiff_deferred everywhere? Maybe @vchuravy can help.
Is this dichotomy also true for other backends, like Zygote.jl or Tapir.jl (@willtebbutt)? In that case we might be able to define two versions of important operators like DI.gradient and DI.derivative: an optimized one and a higher-order friendly one.
Answering your earlier question, generally speaking Enzyme on the outside of all of those AD libraries should work in practice (including itself with deferred on the inside). In practice, not sure but thats why its worth testing.
My later question was rather: āis it suboptimal in terms of performance if I replace every autodiff with autodiff_deferred in DI (even for standard first order stuff)ā?
It would make my life a lot simpler not having to handle two versions of each operator, a direct one and a deferred one.
@oxinabox and I are sorting out Diffractor.jl-over-Tapir.jl (forwards-over-reverse) at the minute. I donāt believe there will be any special requirements to defer stuff, or anything like that. All this being said, this is work-in-progress, so things might change.
There are complications that come if you use deferred, which is why autodiff itself doesnāt just use it by default (though this remains a debate between myself and @vchuravy).
If you know anyone with abstract interpreter experience to help us get the autodiff to autodiff deferred over the finish line, and that is easier than writing the wrapper code and DI, go for it!
Iām sorry thatās not gonna be me or anyone I know well ^^
I think Iām gonna go for an additional set of operators that are not exposed in the API but that will basically amount to gradient_higher_order_friendly. Itās ugly but I should be able to do it with minimal code
If you want the gradient of a scalar-valued function that depends on the gradient of another scalar-valued function, you can use forward-over-reverse combining ForwardDiff with e.g. Zygote or Enzyme or ReverseDiff. See:
(You can also use this approach for general Hessians, but it was less obvious to me that it is efficient for scalar-valued functions.)
I use ForwardDiff.jl over ReverseDiff.jl for Hessians regularly, which has great performance with compiled tapes. Here is my little wrapper struct for doing this efficiently, although maybe the AD people here will wince: ~cgeoga/StandaloneKNITRO.jl (master): src/forwrapper.jl - sourcehut git.
I ended up wrapping the backend object, so that AutoDeferredEnzyme uses autodiff_deferred and AutoEnzyme uses AutoDiff. Second order with forward Enzyme over reverse Enzyme now works in DI (as of v0.5.1), and we can start testing more package combinations too!