Does Zygote differentiate symbolically?

Sorry for posting this question in maybe-not-exactly-right category! This was my guess at group with biggest overlap with my question.

My question is as simple as it says in the title. Does Zygote differentiate symbolically?

I would also like to know:

  • If it does, can the results be seen in some way?
  • Are there other packages that differentiate symbolically + previous question?
  • How (and where) does Zygote overload one single quote?

My question is as simple as it says in the title. Does Zygote differentiate symbolically?

No, Automatic Differentiation is distinct from symbolic differentiation. Zygote can perform forward-mode or reverse-mode automatic differentiation by source rewriting.

If it does, can the results be seen in some way?

Sort of, although it is not very helpful. There is an example in the documentation.

Are there other packages that differentiate symbolically + previous question?

Symbolics.jl can differentiate symbolic expressions.

How (and where) does Zygote overload one single quote?

I didn’t know about this! Can you point out an example of the syntax you mean?

5 Likes

f'(x) denotes the derivative of f(x).

E.g. sin'(0) returns 1.

In my mind, I imagined that differentiation could be done symbolically or numerically, and that automatic differentiation obviated the need to do it oneself by doing one of them. What does Zygote do exactly (not exactly, a broad outline will do fine)?

It sounds like you’re not familiar with the concept of AD. It’s worth starting with the simplified problem of using dual numbers: GitHub - JuliaDiff/DualNumbers.jl: Julia package for representing dual numbers and for performing dual algebra

Read that code and then read the Wikipedia page on AD: Automatic differentiation - Wikipedia

3 Likes

Promoting our lectures for students, you read about AD here
https://juliateachingctu.github.io/Scientific-Programming-in-Julia/dev/lecture_08/lecture/
it is meant to be very introductory

4 Likes

Oh, that is Base.adjoint, which can also be spelled ' . The easy way to find the new method is to use @edit sin'(0) in a REPL, that will bring up this line in your editor.

This is why I prefer the other interpretation of the AD abbreviation – Algorithmic Differentiation. I think it better reflects the essence than Automatic differentiation. Anyway, AD is distinct both from symbolic and numerical differentiation Simple numerical differentiation - #5 by zdenek_hurak.

1 Like

OK, I’m going to summarise what I have gleaned from links in the answers (and to some extent the answers themselves). This is so that others can identify any misunderstandings I might be labouring under.

Let me start with refining my question a bit (this is so that we may not have diverging ideas about what is meant in this context by a certain concept or term). I am talking about Julia at my university department (at some point in the near future). I (and I think my audience) want to know whether Zygote uses finite differences to calculate derivatives.

As far as I understand it, derivatives are taken from rules and associated with the corresponding values. These derivatives are then evaluated and accumulated in numeric form.

To evaluate a derivative we need it’s arguments, meaning that the intermediary results will have to be kept from the evaluation of the overall function, if we want to accumulate the derivatives from the outermost function (in ML typically the loss function) and inwards.

A simple explanation of the differences between Algorithmic Differentiation (A), Numeric Differentiation (N) and Symbolic Differentiation (S) could be shown in a figure:

                                                        ┏━━━━━┳━━━━━━━┓
                                                        ┃ sym ┃  num  ┃
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━╋━━━━━━━┫
┃differentiating primitive parts ("leaves in the graph")┃ A,S ┃ (A),N ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━╋━━━━━━━┫
┃accumulating the derivative ("non-leaves")             ┃  S  ┃  A,N  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━┻━━━━━━━┛

Here sym and num refer to if the part in question is calculated symbolically or numerically. I surmise that Algorithmic Differentiation could potentially revert to numerical evaluation of β€œunfamiliar” nodes and that is why I put an A in parenthesis in the upper right corner.

Apart perhaps from the minutest semantic details, I would like to know if this is a correct description.

Links to introductory or survey journal articles:

https://www.jstor.org/stable/24103956

1 Like

Thanks again for all the references.

I have above given a short explanation, based on what I have learned from said references. What I would like to know is:

Is this explanation correct? If not, what is wrong?

No. Zygote and other AD systems never use finite differences.

Essentially, they accumulate vector–Jacobian products, working backwards from outputs to inputs (for reverse-mode AD; or Jacobian–vector products working from input to output for forward mode). These products are computed by an equivalent of symbolic differentiation for individual computational steps, expressed in low-level compiler building blocks rather than in high-level symbolic expressions/code. If it hits a function call that it cannot analyze (e.g. foreign function calls, mutation, …) then it fails with an error, and you need to supply a manual vector–Jacobian product (an rrule or β€œpullback” via ChainRules.jl for Zygote) for that step.

4 Likes

So, I conclude that my description was correct. To clarify: I only stated that I did not guarantee that numerical derivation couldn’t be used in some algorithmic derivation system. Also, I can no longer edit the post in question.

Also, If I get to write my own derivation rules for unknown functions, what’s to say that I don’t write a rule like \frac{df}{dx}=\frac{f(x+\delta)-f(x)}{\delta}?

Is there any non-semantic reason that an algorithmic derivation system couldn’t use a numeric derivative for a certain group of primitives?

Sure, you could do that. You could also write an incorrect derivative rule if you want. Or your derivative rule could query ChatGPT. Or your function could send an email to your high-school calculus teacher and wait until it receives a response. Code can do lots of things.

There’s nothing physically preventing it. Most people would consider that a bug in an AD system, though.

(A finite difference like that would be noticeable, even without looking inside the code, for having the wrong performance scaling β€” its cost scales proportional to the number of inputs, whereas reverse-mode AD scales proportional to the number of outputs β€” as well as unacceptably large numerical errors.)

2 Likes

Finite differences are clearly good enough in some application and perhaps even the only alternative. Should it then be understood that people doing that would have nothing to gain from introducing rule based derivatives in those places where it’s possible?

you can very often at the very least use dual numbers which are more numerically robust and typically give you higher accuracy.

1 Like

Only if you consider accuracy and scalability to be nothing. Which maybe they are, if finite differences are good enough and you have other concerns to focus on.

1 Like

So, if and when they do this they have implemented an algorithmic derivation system partially using finite differences.

They would then have made a differentiation system that mixes AD and finite differences.

Seems like you are now mainly engaging in semantic bickering, in order to score some far-fetched point. If you think a hybrid system would be useful, then that’s fine, but there’s no reason to insist on it being fully AD.

That is a correct analysis (although I don’t exclude the possibility that a hybrid differentiation system would be useful). The point, the importance of which you so rightly belittle, is that I asked for a non-semantic reason.

The bickering bit could possibly be connected to a certain annoyance that nobody could write the one or two sentences needed to convey the information in my table (possibly without any parenthetical letter), that would have been helpful to understanding both what Zygote does and what the distinction between different differentiation strategies is.

Frankly, I just found the table more confusing because it wasn’t clear what each of the columns/rows/cells meant. In the spirit of a one-liner response though, I remember someone (maybe @ChrisRackauckas?) summarizing it as β€œAD is like symbolic differentiation, just with = [instead of deep nested expression trees]”. Probably butchering the quote, but should clarify that finite differences doesn’t even enter the picture here.