That’s a very similar experience I had as well. I’m still learning Automatic Differentiation (AD), but here are some resources and links that helped my understanding.
When I starting going through the text/lectures/documents describing AD, at first I had a hard time understanding what the difference between Forward Mode AD and Reverse Mode AD was and the context between different techniques.
So here’s a basic/overview video on YouTube that I liked and helped me to get a broad overview of the topic.
What is Automatic Differentiation? by Ari Seff
- I found this video helped clarify what the difference was between Forward Mode and Reverse Mode and how their use cases can differ.
- Its a fairly broad overview but helped me better understand the context of everything.
- The video also has a introduction on how AD is different from symbolic and numerical differentiation.
After I got a gist of how forward-mode and reverse-mode are different, I got confused on what the difference was between Backpropagation and Reverse Mode AD was. From what I learned, Backpropagation is a specific case of Reverse Mode AD. From my understanding reverse mode AD is more general and can handle more outputs, where backpropagation is for only 1 scalar outputs.
Then, I also realized that I’ve already seen Reverse Mode AD in Python before with backprop in machine learning algorithms. So that got me thinking about what the difference is with Python and Julia AD ecosystems. There was a great conversation on Slack that that. Feel free to check here:
Now that I have a better contextual understanding I’m starting to dive deeper into that reading list that was mentioned by Lyndon in the ChainRules docs. I’m trying to deepen my understanding of how Automatic Differentiation (AD) works.
Let me know if there are any questions or any corrections to what I said. I’m still learning, but I hope this helps!