Taking gradients in Julia

ZHEN_LIN · September 24, 2021, 11:45pm

I’m looking for a fast package that can take gradients of functions I write in Julia. Similar to the autograd package for Python.

It seems that there are a few different options such as Zygote, ForwardDiff.jl, ReverseDiff.jl and many others.

What would be a fast (or maybe the fastest) and accurate option to take gradients in Julia?

Thanks.

stevengj · September 24, 2021, 11:58pm

For a summary of options, see: https://juliadiff.org/

If you have a few parameters and a lot of functions to differentiate, probably use ForwardDiff (forward-mode differentiation); if you have a lot of parameters and a few functions, probably use Zygote or maybe ReverseDiff (reverse-mode differentiation).

ZHEN_LIN · September 25, 2021, 1:50am

Thanks! What would be the reason or intuition behind the difference of the packages’ performance in these different cases you mentioned?

ChrisRackauckas · September 25, 2021, 3:46am

We’re putting out a paper in a few days that will go through quite a few examples of AD package performance and how it differs, but one discussion of this can be found in the following paper:

You might want to watch the talk that explains the results:

but one slide that’s really relevant:

Essentially forward mode methods scale like the number of inputs while reverse mode scales like the number of outputs, but in many applications this can look like O(states * parameters) for forward vs O(states + parameters) for reverse. So obviously reverse is better right? Wrong: there are many natural reasons why reverse-mode AD will have a higher baseline overhead.

So if forward-mode AD is faster when problems are small and reverse-mode AD is faster when problems are large, where’s the cutoff? That’s very problem-dependent, and the other paper to be posted soon will show that the given problem can change what AD packages are going to be fast as well. But one thing to look at is the following:

We found that when you had like a size 50 system you’d get the some reverse-mode methods (“based on” Enzyme.jl) would be faster than ForwardDiff.jl, and around 100-150 or so you could get versions of ReverseDiff.jl then at the cutoff. So “roughly 100” is a decent general idea for switching from forward to reverse, depending on the properties of the package.

More on details about taking gradients in a tweet thread https://twitter.com/ChrisRackauckas/status/1440018868269985796 .

When will Zygote be faster vs Enzyme vs etc? Will update on that in about a day or two.

ZHEN_LIN · September 25, 2021, 10:47pm

Thanks a lot! Look forward to this update!

Also I’m curious about if you observe any “significant” difference in speed among these different packages. Or they are roughly comparable even if one is slightly faster in the different situations as you described?

ChrisRackauckas · September 25, 2021, 10:56pm

It can be an order of magnitude difference.

stevengj · September 25, 2021, 10:58pm

It’s not just a difference in software. There is a fundamental difference in algorithms and computational scaling between forward and reverse mode AD, as I said, and which one is better depends on the number of inputs vs the number of outputs; google it.

ChrisRackauckas · September 28, 2021, 1:26am

Here’s the paper I mentioned where Appendix B describes how on the same application 4 or 5 different AD mechanisms can be the optimal choice depending on the user inputs.

This paper also conveniently describes AbstractDifferentiation.jl which is a higher level API for using any AD system, which I would recommend for handling this complexity.

https://github.com/JuliaDiff/AbstractDifferentiation.jl

Topic		Replies	Views
What is the difference between Zygote vs ForwardDiff and ReverseDiff Machine Learning	4	6546	February 23, 2021
Which autodiff to currently use for a neural network backend? General Usage package , statistics , machinevision	10	2167	October 1, 2018
Automatic differentiation performance & computing derivatives of only a subset of the arguments Performance question , autodiff	6	931	October 2, 2021
Forward- and reverse-mode AD comparisons with JAX Performance zygote , forwarddiff , autodiff	7	1415	December 4, 2021
Any faster way of computing small gradients? Performance zygote , forwarddiff , symbolics , autodiff	21	2004	August 11, 2022

Taking gradients in Julia

Related topics