It’s going really well! Enzyme is looking to become the general AD IMO, given that it has such a wide surface of support. That said, whether Enzyme is right for you (or machine learning) is really a binary thing. Right now it doesn’t have support for all of the GC and dynamic dispatch. Part of this delay was because Valentin (one of the biggest contributors) was just off… adding native precompilation caching to Julia (https://github.com/JuliaLang/julia/pull/47184). So can’t be mad about that. But if your code hits GC or dynamic dispatch, it’s not a coin toss as to whether that will work as there are parts of that which are not quite supported yet, which basically means “there be dragons” for now and I would only suggest using it for non-allocating fully inferred code.
That being said, it’s the default that’s used inside of SciMLSensitivity.jl these days, it’s extremely fast, supports mutation, and is robust within the confines of those two caveats above. Its rules system is mostly worked out:
it’s just a question of making it less tedious in the context of activity analysis.
Enzyme core is growing in contributors. There’s an Enzyme conference coming up:
They received an award at SuperComputing 2022.
https://www.csail.mit.edu/news/mit-csail-phd-students-receive-best-student-paper-supercomputing-2022
So with that kind of momentum, the contributor base growing (and at the LLVM level, shared with contributors from Rust), and a solid foundation that supports mutation from the get-go, it’s really on the right path to be a full language-wide AD system. It’s not quite there yet, but that shows it has the momentum as the new foundation.
In the meantime, using Zygote where you define adjoint rules on mutating operations to just call Enzyme isn’t a bad option.
That said…
The most fun is the new AD work:
StochasticAD.jl is based on a new form of automatic differentiation which extends it to discrete stochastic programs.
https://arxiv.org/abs/2210.08572
This allows things like agent-based models and particle filters to be differentiated with automatic differentiation.
Additionally, there’s a new ForwardDiff-like AD being developed for higher order AD:
It adds some vector-based rules that ForwardDiff doesn’t have as well, which makes it able to handle neural networks and linear algebra in a good way.
It’s still under some heavy development, but it’s avoiding the compiler parts that generally makes AD more difficult so it should be quicker for it to get up to speed.