Just a follow-up to this topic: I am experimenting with incorporating many of the above tools into a common framework to handle AD for the gradient (for use in Bayesian inference, primarily).
I found that
-
When Zygote.jl works, it is amazing. But it is experimental, and the usual caveats apply.
-
Flux.jl is pretty good for reverse-mode AD. But when it breaks, I find debugging difficult.
-
ReverseDiff.jl is pretty reliable, except when it is missing AD rules for methods. Then these need to be defined.
-
In general, when looking at the discussions of some of the AD packages, there is a general sentiment that great AD tools are just around the corner and perhaps maintaining/developing existing ones is wasted effort (eg #81 in Nabla.jl, ReverseDiff.jl’s README, and some others, I won’t link all of them). While this is understandable and projects based on eg Cassette.jl are indeed very promising, I think there would be value in keeping the existing tools working in the transition period which may take many months, if not years.