My question is as simple as it says in the title. Does Zygote differentiate symbolically?
No, Automatic Differentiation is distinct from symbolic differentiation. Zygote can perform forward-mode or reverse-mode automatic differentiation by source rewriting.
If it does, can the results be seen in some way?
Sort of, although it is not very helpful. There is an example in the documentation.
Are there other packages that differentiate symbolically + previous question?
Symbolics.jl can differentiate symbolic expressions.
How (and where) does Zygote overload one single quote?
I didnβt know about this! Can you point out an example of the syntax you mean?
In my mind, I imagined that differentiation could be done symbolically or numerically, and that automatic differentiation obviated the need to do it oneself by doing one of them. What does Zygote do exactly (not exactly, a broad outline will do fine)?
Oh, that is Base.adjoint, which can also be spelled ' . The easy way to find the new method is to use @edit sin'(0) in a REPL, that will bring up this line in your editor.
This is why I prefer the other interpretation of the AD abbreviation β Algorithmic Differentiation. I think it better reflects the essence than Automatic differentiation. Anyway, AD is distinct both from symbolic and numerical differentiation Simple numerical differentiation - #5 by zdenek_hurak.
OK, Iβm going to summarise what I have gleaned from links in the answers (and to some extent the answers themselves). This is so that others can identify any misunderstandings I might be labouring under.
Let me start with refining my question a bit (this is so that we may not have diverging ideas about what is meant in this context by a certain concept or term). I am talking about Julia at my university department (at some point in the near future). I (and I think my audience) want to know whether Zygote uses finite differences to calculate derivatives.
As far as I understand it, derivatives are taken from rules and associated with the corresponding values. These derivatives are then evaluated and accumulated in numeric form.
To evaluate a derivative we need itβs arguments, meaning that the intermediary results will have to be kept from the evaluation of the overall function, if we want to accumulate the derivatives from the outermost function (in ML typically the loss function) and inwards.
A simple explanation of the differences between Algorithmic Differentiation (A), Numeric Differentiation (N) and Symbolic Differentiation (S) could be shown in a figure:
βββββββ³ββββββββ
β sym β num β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ«
βdifferentiating primitive parts ("leaves in the graph")β A,S β (A),N β
β£ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ«
βaccumulating the derivative ("non-leaves") β S β A,N β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ»ββββββ»ββββββββ
Here sym and num refer to if the part in question is calculated symbolically or numerically. I surmise that Algorithmic Differentiation could potentially revert to numerical evaluation of βunfamiliarβ nodes and that is why I put an A in parenthesis in the upper right corner.
Apart perhaps from the minutest semantic details, I would like to know if this is a correct description.
No. Zygote and other AD systems never use finite differences.
Essentially, they accumulate vectorβJacobian products, working backwards from outputs to inputs (for reverse-mode AD; or Jacobianβvector products working from input to output for forward mode). These products are computed by an equivalent of symbolic differentiation for individual computational steps, expressed in low-level compiler building blocks rather than in high-level symbolic expressions/code. If it hits a function call that it cannot analyze (e.g. foreign function calls, mutation, β¦) then it fails with an error, and you need to supply a manual vectorβJacobian product (an rrule or βpullbackβ via ChainRules.jl for Zygote) for that step.
So, I conclude that my description was correct. To clarify: I only stated that I did not guarantee that numerical derivation couldnβt be used in some algorithmic derivation system. Also, I can no longer edit the post in question.
Also, If I get to write my own derivation rules for unknown functions, whatβs to say that I donβt write a rule like \frac{df}{dx}=\frac{f(x+\delta)-f(x)}{\delta}?
Is there any non-semantic reason that an algorithmic derivation system couldnβt use a numeric derivative for a certain group of primitives?
Sure, you could do that. You could also write an incorrect derivative rule if you want. Or your derivative rule could query ChatGPT. Or your function could send an email to your high-school calculus teacher and wait until it receives a response. Code can do lots of things.
Thereβs nothing physically preventing it. Most people would consider that a bug in an AD system, though.
(A finite difference like that would be noticeable, even without looking inside the code, for having the wrong performance scaling β its cost scales proportional to the number of inputs, whereas reverse-mode AD scales proportional to the number of outputs β as well as unacceptably large numerical errors.)
Finite differences are clearly good enough in some application and perhaps even the only alternative. Should it then be understood that people doing that would have nothing to gain from introducing rule based derivatives in those places where itβs possible?
Only if you consider accuracy and scalability to be nothing. Which maybe they are, if finite differences are good enough and you have other concerns to focus on.
They would then have made a differentiation system that mixes AD and finite differences.
Seems like you are now mainly engaging in semantic bickering, in order to score some far-fetched point. If you think a hybrid system would be useful, then thatβs fine, but thereβs no reason to insist on it being fully AD.
That is a correct analysis (although I donβt exclude the possibility that a hybrid differentiation system would be useful). The point, the importance of which you so rightly belittle, is that I asked for a non-semantic reason.
The bickering bit could possibly be connected to a certain annoyance that nobody could write the one or two sentences needed to convey the information in my table (possibly without any parenthetical letter), that would have been helpful to understanding both what Zygote does and what the distinction between different differentiation strategies is.
Frankly, I just found the table more confusing because it wasnβt clear what each of the columns/rows/cells meant. In the spirit of a one-liner response though, I remember someone (maybe @ChrisRackauckas?) summarizing it as βAD is like symbolic differentiation, just with = [instead of deep nested expression trees]β. Probably butchering the quote, but should clarify that finite differences doesnβt even enter the picture here.