Does Zygote differentiate symbolically?

Euhan · March 23, 2023, 6:40pm

Sorry for posting this question in maybe-not-exactly-right category! This was my guess at group with biggest overlap with my question.

My question is as simple as it says in the title. Does Zygote differentiate symbolically?

I would also like to know:

If it does, can the results be seen in some way?
Are there other packages that differentiate symbolically + previous question?
How (and where) does Zygote overload one single quote?

contradict · March 23, 2023, 8:16pm

My question is as simple as it says in the title. Does Zygote differentiate symbolically?

No, Automatic Differentiation is distinct from symbolic differentiation. Zygote can perform forward-mode or reverse-mode automatic differentiation by source rewriting.

If it does, can the results be seen in some way?

Sort of, although it is not very helpful. There is an example in the documentation.

Are there other packages that differentiate symbolically + previous question?

Symbolics.jl can differentiate symbolic expressions.

How (and where) does Zygote overload one single quote?

I didn’t know about this! Can you point out an example of the syntax you mean?

Euhan · March 24, 2023, 10:46am

f'(x) denotes the derivative of f(x).

E.g. sin'(0) returns 1.

In my mind, I imagined that differentiation could be done symbolically or numerically, and that automatic differentiation obviated the need to do it oneself by doing one of them. What does Zygote do exactly (not exactly, a broad outline will do fine)?

johnmyleswhite · March 24, 2023, 11:28am

It sounds like you’re not familiar with the concept of AD. It’s worth starting with the simplified problem of using dual numbers: GitHub - JuliaDiff/DualNumbers.jl: Julia package for representing dual numbers and for performing dual algebra

Read that code and then read the Wikipedia page on AD: Automatic differentiation - Wikipedia

Tomas_Pevny · March 24, 2023, 12:24pm

Promoting our lectures for students, you read about AD here
https://juliateachingctu.github.io/Scientific-Programming-in-Julia/dev/lecture_08/lecture/
it is meant to be very introductory

contradict · March 24, 2023, 3:13pm

Oh, that is Base.adjoint, which can also be spelled ' . The easy way to find the new method is to use @edit sin'(0) in a REPL, that will bring up this line in your editor.

zdenek_hurak · March 24, 2023, 4:49pm

This is why I prefer the other interpretation of the AD abbreviation – Algorithmic Differentiation. I think it better reflects the essence than Automatic differentiation. Anyway, AD is distinct both from symbolic and numerical differentiation Simple numerical differentiation - #5 by zdenek_hurak.

Euhan · March 26, 2023, 1:32pm

OK, I’m going to summarise what I have gleaned from links in the answers (and to some extent the answers themselves). This is so that others can identify any misunderstandings I might be labouring under.

Let me start with refining my question a bit (this is so that we may not have diverging ideas about what is meant in this context by a certain concept or term). I am talking about Julia at my university department (at some point in the near future). I (and I think my audience) want to know whether Zygote uses finite differences to calculate derivatives.

As far as I understand it, derivatives are taken from rules and associated with the corresponding values. These derivatives are then evaluated and accumulated in numeric form.

To evaluate a derivative we need it’s arguments, meaning that the intermediary results will have to be kept from the evaluation of the overall function, if we want to accumulate the derivatives from the outermost function (in ML typically the loss function) and inwards.

A simple explanation of the differences between Algorithmic Differentiation (A), Numeric Differentiation (N) and Symbolic Differentiation (S) could be shown in a figure:

                                                        ┏━━━━━┳━━━━━━━┓
                                                        ┃ sym ┃  num  ┃
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━╋━━━━━━━┫
┃differentiating primitive parts ("leaves in the graph")┃ A,S ┃ (A),N ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━╋━━━━━━━┫
┃accumulating the derivative ("non-leaves")             ┃  S  ┃  A,N  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━┻━━━━━━━┛

Here sym and num refer to if the part in question is calculated symbolically or numerically. I surmise that Algorithmic Differentiation could potentially revert to numerical evaluation of “unfamiliar” nodes and that is why I put an A in parenthesis in the upper right corner.

Apart perhaps from the minutest semantic details, I would like to know if this is a correct description.

CameronBieganek · March 26, 2023, 2:17pm

Links to introductory or survey journal articles:

https://www.jstor.org/stable/24103956

Euhan · March 28, 2023, 10:20am

Thanks again for all the references.

I have above given a short explanation, based on what I have learned from said references. What I would like to know is:

Is this explanation correct? If not, what is wrong?

stevengj · March 28, 2023, 11:49am

No. Zygote and other AD systems never use finite differences.

Essentially, they accumulate vector–Jacobian products, working backwards from outputs to inputs (for reverse-mode AD; or Jacobian–vector products working from input to output for forward mode). These products are computed by an equivalent of symbolic differentiation for individual computational steps, expressed in low-level compiler building blocks rather than in high-level symbolic expressions/code. If it hits a function call that it cannot analyze (e.g. foreign function calls, mutation, …) then it fails with an error, and you need to supply a manual vector–Jacobian product (an rrule or “pullback” via ChainRules.jl for Zygote) for that step.

Euhan · March 29, 2023, 10:42am

So, I conclude that my description was correct. To clarify: I only stated that I did not guarantee that numerical derivation couldn’t be used in some algorithmic derivation system. Also, I can no longer edit the post in question.

Also, If I get to write my own derivation rules for unknown functions, what’s to say that I don’t write a rule like \frac{df}{dx}=\frac{f(x+\delta)-f(x)}{\delta}?

Is there any non-semantic reason that an algorithmic derivation system couldn’t use a numeric derivative for a certain group of primitives?

stevengj · March 29, 2023, 10:52am

Sure, you could do that. You could also write an incorrect derivative rule if you want. Or your derivative rule could query ChatGPT. Or your function could send an email to your high-school calculus teacher and wait until it receives a response. Code can do lots of things.

There’s nothing physically preventing it. Most people would consider that a bug in an AD system, though.

(A finite difference like that would be noticeable, even without looking inside the code, for having the wrong performance scaling — its cost scales proportional to the number of inputs, whereas reverse-mode AD scales proportional to the number of outputs — as well as unacceptably large numerical errors.)

Euhan · March 29, 2023, 3:45pm

Finite differences are clearly good enough in some application and perhaps even the only alternative. Should it then be understood that people doing that would have nothing to gain from introducing rule based derivatives in those places where it’s possible?

Oscar_Smith · March 29, 2023, 3:50pm

you can very often at the very least use dual numbers which are more numerically robust and typically give you higher accuracy.

stevengj · March 29, 2023, 7:58pm

Only if you consider accuracy and scalability to be nothing. Which maybe they are, if finite differences are good enough and you have other concerns to focus on.

Euhan · March 30, 2023, 11:20pm

So, if and when they do this they have implemented an algorithmic derivation system partially using finite differences.

DNF · March 31, 2023, 5:10am

They would then have made a differentiation system that mixes AD and finite differences.

Seems like you are now mainly engaging in semantic bickering, in order to score some far-fetched point. If you think a hybrid system would be useful, then that’s fine, but there’s no reason to insist on it being fully AD.

Euhan · March 31, 2023, 10:43am

That is a correct analysis (although I don’t exclude the possibility that a hybrid differentiation system would be useful). The point, the importance of which you so rightly belittle, is that I asked for a non-semantic reason.

The bickering bit could possibly be connected to a certain annoyance that nobody could write the one or two sentences needed to convey the information in my table (possibly without any parenthetical letter), that would have been helpful to understanding both what Zygote does and what the distinction between different differentiation strategies is.

ToucheSir · March 31, 2023, 2:18pm

Frankly, I just found the table more confusing because it wasn’t clear what each of the columns/rows/cells meant. In the spirit of a one-liner response though, I remember someone (maybe @ChrisRackauckas?) summarizing it as “AD is like symbolic differentiation, just with = [instead of deep nested expression trees]”. Probably butchering the quote, but should clarify that finite differences doesn’t even enter the picture here.

Topic		Replies	Views
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1774	April 21, 2022
Comparison of automatic differentiation tools from 2016 still accurate? Numerics differentiation	41	5822	August 16, 2018
Reddit discussion: limitations of Zygote Machine Learning	15	2905	April 14, 2020
Why does Zygote produce a wrong derivative? Machine Learning zygote	5	980	August 29, 2020
Automatic differentiation performance & computing derivatives of only a subset of the arguments Performance question , autodiff	6	931	October 2, 2021

Does Zygote differentiate symbolically?

Related topics