Learning automatic differentiation

anon74562486 · February 27, 2021, 9:31pm

I am a beginner in programming in julia and am studying the basics of deep learning with Flux and Zygote. In my study I have only encountered a few examples of backprop but I realized (by reading the discussions here and watching videos on youtube such as Mike Innes on julia language channel) that automatic differentiation is an extremely broad subject, and I know nothing about it.

Where did you study the topics regarding automatic differentiation that you apply in julia?
What do I need to study to understand how automatic differentiation libraries like Zygote or ForwardDiff work?
Are there any books that also have examples in julia?
Thanks, sorry for my English.

johnmyleswhite · February 27, 2021, 9:41pm

Maybe worth starting with GitHub - MikeInnes/diff-zoo: Differentiation for Hackers

There are a variety of good academic reviews of the subject, tutorials and textbooks if you want to learn more.

anon74562486 · February 27, 2021, 9:50pm

Thank you.
I’m a beginner, which textbook/tutorial can you recommend me?

johnmyleswhite · February 27, 2021, 9:57pm

Maybe start with https://www.jmlr.org/papers/volume18/17-468/17-468.pdf

oxinabox · February 27, 2021, 11:34pm

My recommend reading/watching list is here in the ChainRules docs, at the bottom of the page

I have been told that the ChainRules docs themselves are quite educational.
We definitely tried hard to be, though I think they could be better.

oxinabox · February 27, 2021, 11:40pm

Are there any books that also have examples in julia?

There are no good books on Automatic Differentiation. Let alone ones that have examples in Julia.
Griewank and Walther 2008 “Evaluating Derivatives” is fairly comprehensive (for reverse mode), but not all that readable.
The alternatives that I am aware of are neither comprehensive nor readable

kimlaberinto · February 28, 2021, 5:52am

That’s a very similar experience I had as well. I’m still learning Automatic Differentiation (AD), but here are some resources and links that helped my understanding.

When I starting going through the text/lectures/documents describing AD, at first I had a hard time understanding what the difference between Forward Mode AD and Reverse Mode AD was and the context between different techniques.

So here’s a basic/overview video on YouTube that I liked and helped me to get a broad overview of the topic.

What is Automatic Differentiation? by Ari Seff

I found this video helped clarify what the difference was between Forward Mode and Reverse Mode and how their use cases can differ.
Its a fairly broad overview but helped me better understand the context of everything.
The video also has a introduction on how AD is different from symbolic and numerical differentiation.

After I got a gist of how forward-mode and reverse-mode are different, I got confused on what the difference was between Backpropagation and Reverse Mode AD was. From what I learned, Backpropagation is a specific case of Reverse Mode AD. From my understanding reverse mode AD is more general and can handle more outputs, where backpropagation is for only 1 scalar outputs.

Then, I also realized that I’ve already seen Reverse Mode AD in Python before with backprop in machine learning algorithms. So that got me thinking about what the difference is with Python and Julia AD ecosystems. There was a great conversation on Slack that that. Feel free to check here:

Now that I have a better contextual understanding I’m starting to dive deeper into that reading list that was mentioned by Lyndon in the ChainRules docs. I’m trying to deepen my understanding of how Automatic Differentiation (AD) works.

Let me know if there are any questions or any corrections to what I said. I’m still learning, but I hope this helps!

anon74562486 · February 28, 2021, 7:32am

Thanks to everyone, every one of your responses has been helpful.

oxinabox · February 28, 2021, 11:16am

From my understanding reverse mode AD is more general and can handle more outputs, where backpropagation is for only 1 scalar outputs.

Could be, but remember the field is absolutely terrible at consistency of naming things.
Probably because it’s good at reinventing things and giving them new names.

Of historical interest:
From 1991 (when the term was created) until about 2012 backpropagation in neural networks was done “by hand”,
Where you coded a rule for each layer in the network and then composed them.
Vs automatic differentiation which decomposes function that it doesn’t have ruled for into parts that it does.

It’s all applying the chain rule “backwards”

Topic		Replies	Views
Understanding automatic differentiation (long-form YouTube video) Teaching & Outreach autodiff	0	754	December 11, 2021
Automatic differentiation - Julia implementation advantages Machine Learning	8	4090	June 7, 2019
Automatic differentiation in Julia for gradient computation General Usage autodiff	1	737	September 8, 2023
What is the difference between Zygote vs ForwardDiff and ReverseDiff Machine Learning	4	6562	February 23, 2021
Which autodiff to currently use for a neural network backend? General Usage package , statistics , machinevision	10	2167	October 1, 2018

Learning automatic differentiation

Related topics