Which autodiff to currently use for a neural network backend?

Just a follow-up to this topic: I am experimenting with incorporating many of the above tools into a common framework to handle AD for the gradient (for use in Bayesian inference, primarily).

I found that

  1. When Zygote.jl works, it is amazing. But it is experimental, and the usual caveats apply.

  2. Flux.jl is pretty good for reverse-mode AD. But when it breaks, I find debugging difficult.

  3. ReverseDiff.jl is pretty reliable, except when it is missing AD rules for methods. Then these need to be defined.

  4. In general, when looking at the discussions of some of the AD packages, there is a general sentiment that great AD tools are just around the corner and perhaps maintaining/developing existing ones is wasted effort (eg #81 in Nabla.jl, ReverseDiff.jl’s README, and some others, I won’t link all of them). While this is understandable and projects based on eg Cassette.jl are indeed very promising, I think there would be value in keeping the existing tools working in the transition period which may take many months, if not years.

8 Likes