What is the difference between Zygote vs ForwardDiff and ReverseDiff

qft · February 22, 2021, 5:41pm

As far as I can tell from the documentation, both ForwardDiff and ReverseDiff can already do automatic differentiation on arbitrary Julia functions. Also Zygote depends on ForwardDiff. What unique feature does Zygote brings to the picture?

anon74562486 · February 23, 2021, 9:58pm

I am not an expert, so feel free to wait for answers from other users, they are much more experienced than I am.
I am studying deep learning and with Zygote I can easly differentiate Flux neural network models.
It is also much faster than the others you mentioned (I have tried it vs ReverseDiff with some simple feedforward networks).
As far as I know (I’m a student) forward mode automatic differentiation is used when you have few parameters, otherwise you have to use reverse mode automatic differentiation.

No, all the libraries you mentioned have limitations.

Where did you read that?

For the study of deep learning I always use Zygote.
Sorry for my english.

qft · February 23, 2021, 10:43pm

Thank you for your kind response. You mentioned Zygote being faster than ReverseDiff. Is this generally the case? Are there circumstances that ForwardDiff/ReverseDiff performs faster than Zygote?

The main reason I was making this post was that I would like know what are the main limitations of ForwardDiff/ReverseDiff compared with Zygote (and vice versa). From the documentations (Limitations of ForwardDiff · ForwardDiff and Limitations of ReverseDiff - ReverseDiff.jl) of these libraries, it seem the requirements are very lenient? One notable limitation for these two libraries is that they do not support mutation. But Zygote does not seem to support mutation either.

ForwardDiff is listed as a dependency of Zygote in Project.toml (https://github.com/FluxML/Zygote.jl/blob/6b89a068e40bad9673e163e9aee43f2bc4940242/Project.toml).

marius311 · February 23, 2021, 11:11pm

For a few-sentence summary of those (and several more) AD packages, see https://juliadiff.org/.

Zygote and ReverseDiff are both reverse-mode AD, but while ReverseDiff pushes custom types through your code to compute the backward pass (hence your code must be written to accept generic types), Zygote effectively rewrites the source code of your functions and works through more arbitrary code. For example, this fails:

ReverseDiff.gradient((x::Vector{Float64}) -> sum(x), ones(10))

but replacing ReverseDiff with Zygote works (of course in this trivial example its easy to make ReverseDiff work by just dropping that type annotation, but often its not this easy, especially if the code your differentiating is in someone else’s package).

The dependency of Zygote on ForwardDiff is just for a small piece used when broadcasting over CuArrays, Zygote is still reverse mode.

stevengj · February 23, 2021, 11:27pm

To clarify, “reverse mode” AD is efficient when you have a functions f(x) with small number of outputs f_i and many inputs x_j (in computing \partial f_i/\partial x_j), i.e. for functions mapping x\in\mathbb{R}^m to f \in \mathbb{R}^n with n \ll m. (For example, in neural-network training where you want the derivative of one loss function (n=1) with respect to millions (m) of network parameters. (The “manual” application of such a technique is also known as an adjoint method, and in the neural-net case it is called backpropagation.)

In contrast, forward-mode AD (as in ForwardDiff.jl) is better when there is a small number of inputs and a large number of outputs, i.e. when n \gg m, i.e. when you are computing many functions of a few variables. (It essentially corresponds to “manual” application of the chain rule in the most obvious way.)

Topic		Replies	Views
Which autodiff to currently use for a neural network backend? General Usage package , statistics , machinevision	10	2167	October 1, 2018
Taking gradients in Julia General Usage question , zygote , forwarddiff , reversediff	7	2130	September 28, 2021
Automatic differentiation performance & computing derivatives of only a subset of the arguments Performance question , autodiff	6	931	October 2, 2021
Zygote vs. Forward Diff with Optim Machine Learning performance , optim , zygote , forwarddiff	4	619	March 24, 2023
Zygote terribly slow. What am I doing wrong? Machine Learning	12	1746	January 19, 2022

What is the difference between Zygote vs ForwardDiff and ReverseDiff

Related topics