Second-order autodiff: which combinations should work?

gdalle · May 29, 2024, 6:19am

Hey there!

My package DifferentiationInterface.jl provides the necessary infrastructure for second-order automatic differentiation (e.g. Hessians). It allows you to combine arbitrary pairs of AD backends, but of course many of the combinations will fail. My question is therefore: which pairs of backends should I aim to support and test?

This was prompted by issue #278 of DI, which asks for second-order Enzyme support, but we can try to cover more ground! Judging by the list in the README, there are 13 available AD packages, which translates to roughly 14 different backends (counting forward and reverse Enzyme). I don’t want to test \binom{14}{2} combinations, so I thought of some ways to reduce the list:

symbolic backends (Symbolics.jl and FastDifferentiation.jl) will rarely be paired with something else, we can discard them
finite differences backends (FiniteDiff.jl and FiniteDifferences.jl) should pair well with most other backends so I’m less curious about the results of testing
experimental backends (Diffractor.jl and Tapir.jl) are low-priority
there are some near-duplicates in the list (I anticipate bikeshedding on this):
- PolyesterForwardDiff.jl is well-tested if ForwardDiff.jl is
- ChainRulesCore.jl is well-tested if Zygote.jl is

Here’s the table of combinations that are currently part of the test suite for Hessians, I will add your suggestions to it as the discussion progresses (if they make sense). The table should be read as “outer (row) over inner (column)”.

outer \ inner	Enz [F]	Enz [R]	ForDiff	RevDiff	Zygote
Enz [F]	wanted	tested	wanted
Enz [R]	wanted	tested
ForDiff	tested	wanted	tested	wanted	tested
RevDiff				tested
Tracker
Zygote					tested

What do you think?
Pinging @oxinabox @wsmoses @ChrisRackauckas @Vaibhavdixit02 @avikpal

ChrisRackauckas · May 29, 2024, 7:00am

Enzyme over enzyme works just fine. You just need to use the “delayed” form for the interior.

Forward over Tracker has issues, I’d just ignore that one.

The rest looks right.

odow · May 29, 2024, 9:46am

For an example of Enzyme over Enzyme, see
Automatic differentiation of user-defined operators · JuMP

gdalle · May 29, 2024, 11:33am

I added this trick in the following PR, but I wonder if it is possible to learn a lesson from it.

The underlying issue is that for Enzyme, you have to do something differently in order to enable higher-order differentiation (use autodiff_deferred instead of autodiff).

Do we lose something if we use autodiff_deferred everywhere? Maybe @vchuravy can help.
Is this dichotomy also true for other backends, like Zygote.jl or Tapir.jl (@willtebbutt)? In that case we might be able to define two versions of important operators like DI.gradient and DI.derivative: an optimized one and a higher-order friendly one.

wsmoses · May 29, 2024, 11:40am

The need for deferred is specific to GPU-compiler related packages (including CUDA.jl, etc).

It’s been on our todo list to make our abstract interpreter automatically upgrade internal autodiffs to deferred, but I don’t know enough about the Julia abstract interpreter to do so, and we’ve so far not found someone who does (open issue here: Automate use of deferred in Higher order derivatives · Issue #1005 · EnzymeAD/Enzyme.jl · GitHub )

Answering your earlier question, generally speaking Enzyme on the outside of all of those AD libraries should work in practice (including itself with deferred on the inside). In practice, not sure but thats why its worth testing.

gdalle · May 29, 2024, 11:44am

Thanks for the answer!

My later question was rather: “is it suboptimal in terms of performance if I replace every autodiff with autodiff_deferred in DI (even for standard first order stuff)”?
It would make my life a lot simpler not having to handle two versions of each operator, a direct one and a deferred one.

willtebbutt · May 29, 2024, 11:57am

@oxinabox and I are sorting out Diffractor.jl-over-Tapir.jl (forwards-over-reverse) at the minute. I don’t believe there will be any special requirements to defer stuff, or anything like that. All this being said, this is work-in-progress, so things might change.

wsmoses · May 29, 2024, 12:10pm

There are complications that come if you use deferred, which is why autodiff itself doesn’t just use it by default (though this remains a debate between myself and @vchuravy).

If you know anyone with abstract interpreter experience to help us get the autodiff to autodiff deferred over the finish line, and that is easier than writing the wrapper code and DI, go for it!

gdalle · May 29, 2024, 1:58pm

I’m sorry that’s not gonna be me or anyone I know well ^^

I think I’m gonna go for an additional set of operators that are not exposed in the API but that will basically amount to gradient_higher_order_friendly. It’s ugly but I should be able to do it with minimal code

stevengj · May 29, 2024, 2:05pm

If you want the gradient of a scalar-valued function that depends on the gradient of another scalar-valued function, you can use forward-over-reverse combining ForwardDiff with e.g. Zygote or Enzyme or ReverseDiff. See:

(You can also use this approach for general Hessians, but it was less obvious to me that it is efficient for scalar-valued functions.)

gdalle · May 29, 2024, 2:25pm

At the moment I’m only interested in plain boring second-order autodiff of a function f that is presumably defined without autodiff inside of it

cgeoga · May 29, 2024, 2:34pm

I use ForwardDiff.jl over ReverseDiff.jl for Hessians regularly, which has great performance with compiled tapes. Here is my little wrapper struct for doing this efficiently, although maybe the AD people here will wince: ~cgeoga/StandaloneKNITRO.jl (master): src/forwrapper.jl - sourcehut git.

gdalle · May 29, 2024, 2:46pm

That’s interesting, thanks for sharing!

gdalle · May 30, 2024, 10:13am

I ended up wrapping the backend object, so that AutoDeferredEnzyme uses autodiff_deferred and AutoEnzyme uses AutoDiff.
Second order with forward Enzyme over reverse Enzyme now works in DI (as of v0.5.1), and we can start testing more package combinations too!

Topic		Replies	Views
Diffractor release Package Announcements autodiff	30	2943	July 29, 2023
Picking an AD Backend and Enzyme Errors Optimization (Mathematical) optimization , enzyme	8	325	March 3, 2024
Odd warning and issues with optimization only when inside Optimization.jl. Zygote.hessian works fine Machine Learning question , optimization , zygote	8	123	October 30, 2024
[ANN] DifferentiationInterface - gradients for everyone Package Announcements zygote , forwarddiff , ad , autodiff , enzyme	5	1388	October 8, 2024
AD pipeline and Hessian–vector products Optimization (Mathematical) autodiff	12	738	July 1, 2024

Second-order autodiff: which combinations should work?

Related topics