(I find that there were related discussions already:
- Automatic Differentiation (AD) in Python compared to Julia and AD Basics
- Automatic differentiation - Julia implementation advantages.
But they were at least 4 years old and do not answer my specific questions, so I create a new topic.)
One of my collaborators who are familiar with both Julia and Python shared with me his experience that it was much easier to build an AD-compatible code in Julia than in Python. I had written a Julia package that was built without AD-compatibility in mind (actually without even knowing what AD is), but he found that my package was already AD-compatible and was able to use it in his code that requires AD. On the other hand, his Python package developed without AD-compatibility in mind took a significant amount of time to be made AD-compatible (more than the time taken for developing the original code itself). His experience gave him an impression that that Julia code requires almost no effort for AD compatibility, whereas Python code requires a significant effort.
When I told this to another collaborator who is a PyTorch proponent, she disagreed with the observation. (Maybe the difference is from Python for the 1st collaborator vs. PyTorch for the 2nd collaborator?) She had an experience of building a simple AD framework herself, and she says that any AD framework relies on chain rules and a lookup table mapping elementary functions to their derivatives (sin
to cos
, x
to 1
, etc). She argues that Julia’s AD framework should have the same reliance, and therefore there can’t be any Julia-specific advantage over Python.
I don’t know much how AD is implemented in general, and I was not able to find a document that compares AD in Julia and PyTorch. So, I asked ChatGPT what it thinks, and the answer seems quite favorable to Julia, which is promising to me who wants to promote Julia in my team. But I am wondering how accurate these statements are. I will appreciate insights from experts!
Here goes the answer from ChatGPT:
Short Answer
Yes, Julia’s AD libraries do rely on a table of derivatives for built-in functions (\sin, \exp, \log, etc.), just like PyTorch or any other AD system. However, Julia’s advantage lies in how its AD can automatically transform and analyze all of your Julia code—loops, conditionals, user-defined types—at the compiler or IR level. By contrast, Python-based AD often revolves around tracing Python calls (or bridging to C/C++ extensions), which can be less transparent and more limiting.
1. The Same “Derivative Table,” but That’s Just the Start
- Elementary Function Derivatives:
Like any AD system, both Julia’s AD and PyTorch have rules for \sin'(x) = \cos(x),\ \exp'(x) = \exp(x), \ldots. - The Real Difference:
These rules are only a small part of the puzzle. The rest hinges on how the AD system processes loops, conditionals, and custom library code that compose these elementary operations.
2. Julia’s AD Operates at the Language/Compiler Level
- Source-to-Source / IR Transformations:
Julia tools like Zygote or Enzyme transform Julia’s internal IR or use LLVM-level analysis. They see and rewrite your entire function—including loops and branching—rather than building a separate “graph” or tape at runtime. - No Special Primitives Needed:
This means ordinaryfor
loops and conditionals in Julia are differentiable without rewriting them into specialized ops likelax.fori_loop
(JAX) ortf.while_loop
(TensorFlow).
3. Libraries Written in Julia, Not C/C++
- Fewer Black Boxes:
Many Python numeric libraries are C/C++ extensions that AD cannot easily inspect unless custom backward passes exist. In Julia, packages are predominantly written in Julia. - Uniform Visibility:
Because AD sees all-Julia code down to the lowest levels, it can differentiate advanced library routines without hitting opaque boundaries.
4. Seamless Integration vs. Runtime Tracing
- Tracing Overhead:
PyTorch (and others) often trace operations on tensors at runtime to build a computational graph, which can become awkward for complex control flow. - Julia’s “Native” Approach:
Directly transforms your code at compile-time or IR-level, so advanced patterns (recursion, custom data structures, etc.) work out of the box. There’s less need for rewriting or special AD-friendly APIs.
Conclusion
Yes, both Julia and PyTorch rely on a derivative “lookup table” for elementary math functions. But Julia’s AD also benefits from:
- Source/IR-level integration (rather than pure runtime tracing),
- Uniform, all-in-Julia libraries (minimizing black-box native extensions), and
- Natural handling of loops and conditionals (no special APIs needed).
All of this makes Julia’s AD feel more native and flexible, providing a straightforward path to differentiating any code you write in Julia—beyond just the basic math functions.