Second-order autodiff: which combinations should work?

Hey there!

My package DifferentiationInterface.jl provides the necessary infrastructure for second-order automatic differentiation (e.g. Hessians). It allows you to combine arbitrary pairs of AD backends, but of course many of the combinations will fail. My question is therefore: which pairs of backends should I aim to support and test?

This was prompted by issue #278 of DI, which asks for second-order Enzyme support, but we can try to cover more ground! Judging by the list in the README, there are 13 available AD packages, which translates to roughly 14 different backends (counting forward and reverse Enzyme). I don’t want to test \binom{14}{2} combinations, so I thought of some ways to reduce the list:

  • symbolic backends (Symbolics.jl and FastDifferentiation.jl) will rarely be paired with something else, we can discard them
  • finite differences backends (FiniteDiff.jl and FiniteDifferences.jl) should pair well with most other backends so I’m less curious about the results of testing
  • experimental backends (Diffractor.jl and Tapir.jl) are low-priority
  • there are some near-duplicates in the list (I anticipate bikeshedding on this):
    • PolyesterForwardDiff.jl is well-tested if ForwardDiff.jl is
    • ChainRulesCore.jl is well-tested if Zygote.jl is

Here’s the table of combinations that are currently part of the test suite for Hessians, I will add your suggestions to it as the discussion progresses (if they make sense). The table should be read as “outer (row) over inner (column)”.

outer \ inner Enz [F] Enz [R] ForDiff RevDiff Tracker Zygote
Enz [F] wanted tested wanted
Enz [R] wanted tested
ForDiff tested wanted tested wanted tested
RevDiff tested
Tracker
Zygote tested

What do you think?
Pinging @oxinabox @wsmoses @ChrisRackauckas @Vaibhavdixit02 @avikpal

10 Likes

Enzyme over enzyme works just fine. You just need to use the “delayed” form for the interior.

Forward over Tracker has issues, I’d just ignore that one.

The rest looks right.

3 Likes

For an example of Enzyme over Enzyme, see
Automatic differentiation of user-defined operators · JuMP

3 Likes

I added this trick in the following PR, but I wonder if it is possible to learn a lesson from it.

The underlying issue is that for Enzyme, you have to do something differently in order to enable higher-order differentiation (use autodiff_deferred instead of autodiff).

  • Do we lose something if we use autodiff_deferred everywhere? Maybe @vchuravy can help.
  • Is this dichotomy also true for other backends, like Zygote.jl or Tapir.jl (@willtebbutt)? In that case we might be able to define two versions of important operators like DI.gradient and DI.derivative: an optimized one and a higher-order friendly one.

The need for deferred is specific to GPU-compiler related packages (including CUDA.jl, etc).

It’s been on our todo list to make our abstract interpreter automatically upgrade internal autodiffs to deferred, but I don’t know enough about the Julia abstract interpreter to do so, and we’ve so far not found someone who does (open issue here: Automate use of deferred in Higher order derivatives · Issue #1005 · EnzymeAD/Enzyme.jl · GitHub )

Answering your earlier question, generally speaking Enzyme on the outside of all of those AD libraries should work in practice (including itself with deferred on the inside). In practice, not sure but thats why its worth testing.

2 Likes

Thanks for the answer!

My later question was rather: “is it suboptimal in terms of performance if I replace every autodiff with autodiff_deferred in DI (even for standard first order stuff)”?
It would make my life a lot simpler not having to handle two versions of each operator, a direct one and a deferred one.

@oxinabox and I are sorting out Diffractor.jl-over-Tapir.jl (forwards-over-reverse) at the minute. I don’t believe there will be any special requirements to defer stuff, or anything like that. All this being said, this is work-in-progress, so things might change.

1 Like

There are complications that come if you use deferred, which is why autodiff itself doesn’t just use it by default (though this remains a debate between myself and @vchuravy).

If you know anyone with abstract interpreter experience to help us get the autodiff to autodiff deferred over the finish line, and that is easier than writing the wrapper code and DI, go for it!

I’m sorry that’s not gonna be me or anyone I know well ^^

I think I’m gonna go for an additional set of operators that are not exposed in the API but that will basically amount to gradient_higher_order_friendly. It’s ugly but I should be able to do it with minimal code

If you want the gradient of a scalar-valued function that depends on the gradient of another scalar-valued function, you can use forward-over-reverse combining ForwardDiff with e.g. Zygote or Enzyme or ReverseDiff. See:

(You can also use this approach for general Hessians, but it was less obvious to me that it is efficient for scalar-valued functions.)

At the moment I’m only interested in plain boring second-order autodiff of a function f that is presumably defined without autodiff inside of it

I use ForwardDiff.jl over ReverseDiff.jl for Hessians regularly, which has great performance with compiled tapes. Here is my little wrapper struct for doing this efficiently, although maybe the AD people here will wince: ~cgeoga/StandaloneKNITRO.jl (master): src/forwrapper.jl - sourcehut git.

3 Likes

That’s interesting, thanks for sharing!

1 Like

I ended up wrapping the backend object, so that AutoDeferredEnzyme uses autodiff_deferred and AutoEnzyme uses AutoDiff.
Second order with forward Enzyme over reverse Enzyme now works in DI (as of v0.5.1), and we can start testing more package combinations too!

2 Likes