Zygote is a remarkable tool. The fact that it’s able to perform source-to-source automatic differentiation on almost all of Julia (!) already makes it, I think, one of the most powerful AD backends around. It’s performant, mostly, and honestly just beautiful.
Still, at this point it seems clear that it won’t be able to fulfill all the objectives laid out in its introduction paper. Perhaps most prominently, Zygote remains unable to deal with mutation.
This, along with issues regarding type-stability and some bugs, has lead to some of the most prominent ecosystems transitioning to other back-ends. It has also lead to the development of multiple experimental backends which operate on lower-level representations of Julia code, such as Enzyme.jl and Diffractor.jl.
These difficulties seem to be somehow intrinsically related to how Zygote and Julia work, but I’ve found it very hard to understand exactly why Zygote’s elegant approach doesn’t work as expected. Can anyone put it in simple terms?
Could one hope, for example, for a “fix” to these issues in a Julia 2.0? Would it be reasonable to try and think of a fully differentiable subset of Julia where source-to-source differentiation “just works”, even with mutation?
It’s rule system does not allow for rules of things which mutate in a way that properly caches what is changed and not changed in minimal way, it relies on codegen on everything and then doing a dead code elimination which makes it very hard to differentiate the minimal amount (and thus minimally track mutation), and it relies on simplified type inference heuristics to work for performance. These are all fundamental limitations in it’s design that cannot be fixed with a change to the language. The AD underpinnings would have to be rewritten to do that.
The rewrite would have to move the AD to the typed IR level or lower, then the rule system would have to be changed so that rules take into subsets being differentiated. This rewrite is effectively a description of what Mooncake.jl does differently, and is also the reason for why EnzymeRules.jl is not compatible with ChainRules. You basically cannot make it handle all of the cases and be efficient without such a break in the design.
Because of this, Enzyme and Mooncake are good candidates moving forward. Enzyme has a lot of nice stuff, already supports Lux and GPUs for example, and has lots of SciML integration. There’s lots of nice wins like being able to differentiate C and Fortran code in tandem, and GPU kernels. But its error messages come from the LLVM level so it needs more work to handle that. It is multiple language so it has a fairly robust dev team.
Mooncake is at the Julia level with some better error messages but it is newer, has less devs, and less integrations right now. But the devs are active and going to help with the SciML integration, and the rules are similar enough to Enzyme that once the jump is made to that level of rules (which ChainRules to Enzyme is a big complexity jump for rule writing), the Mooncake rule isn’t so hard.