Automatic Differentiation (AD) in Python compared to Julia and AD Basics

ChrisRackauckas · October 22, 2021, 2:22pm

TF Eager, PyTorch, and older Julia tools (Tracker, ReverseDiff, etc.) use a tape. Tapes are a bit slow because they have overhead in the forward pass and they cannot optimize backwards passes really well because the pass can change each time (every new value can give a new tape). Tensorflow 1.x and Jax work on quasi-static codes, i.e. codes that can be expanded to a static compute graph (not surprising they have the same limitation given they are both built heavily on the same XLA IR). This is very fast but it limits the kinds of codes that can be handled (fully dynamic codes are beyond this limitation) or at least optimize.

Newer Julia tools (Zygote, Diffractor, Enzyme, etc.) work directly on the lowered code doing a source-to-source transformation. Unlike a tape, this source can contain all branches meaning you can fully JIT compile and optimize it without much issue (it won’t change for different forward values). And unlike the XLA-based tools it can fully optimize things which include value-dependent while loops (yes, Jax can represent some of these things but notice its JIT compilation and/or reverse mode AD passes are not compatible with these features because that’s beyond the quasi-static domain they generally allow). These tools have a harder time optimizing some operations as much as XLA because of the much wider program space that they tackle though, but get to rely on the Julia compiler itself and its optimization passes which are steadily improving over time.

Are there cases which thread this needle where PyTorch, Jax, and Tensorflow wouldn’t optimize the method but the Julia tools would? I wrote a blog post detailing one such architecture: Useful Algorithms That Are Not Optimized By Jax, PyTorch, or Tensorflow - Stochastic Lifestyle . But more importantly, once you see one example it’s pretty clear what kinds of algorithms will have that property.

Topic		Replies	Views
Automatic Differentiation (AD) in Julia vs. Python (or PyTorch) Machine Learning autodiff	14	1484	January 16, 2025
State of automatic differentiation in Julia Machine Learning	57	21821	September 8, 2021
Automatic differentiation - Julia implementation advantages Machine Learning	8	4083	June 7, 2019
Comparison of Julia autodiff packages General Usage autodiff	1	191	September 27, 2024
State of AD in 2024 General Usage machine-learning , autodiff	4	2183	April 6, 2024

Automatic Differentiation (AD) in Python compared to Julia and AD Basics

Related topics