Which autodiff to currently use for a neural network backend?

Tamas_Papp · October 1, 2018, 5:08pm

Just a follow-up to this topic: I am experimenting with incorporating many of the above tools into a common framework to handle AD for the gradient (for use in Bayesian inference, primarily).

I found that

When Zygote.jl works, it is amazing. But it is experimental, and the usual caveats apply.
Flux.jl is pretty good for reverse-mode AD. But when it breaks, I find debugging difficult.
ReverseDiff.jl is pretty reliable, except when it is missing AD rules for methods. Then these need to be defined.
In general, when looking at the discussions of some of the AD packages, there is a general sentiment that great AD tools are just around the corner and perhaps maintaining/developing existing ones is wasted effort (eg #81 in Nabla.jl, ReverseDiff.jl’s README, and some others, I won’t link all of them). While this is understandable and projects based on eg Cassette.jl are indeed very promising, I think there would be value in keeping the existing tools working in the transition period which may take many months, if not years.

Topic		Replies	Views
What is the difference between Zygote vs ForwardDiff and ReverseDiff Machine Learning	4	6661	February 23, 2021
Taking gradients in Julia General Usage question , zygote , forwarddiff , reversediff	7	2180	September 28, 2021
Use ForwardDiff instead of Zygote with Flux? Machine Learning	10	1751	September 3, 2021
Speeding up gradients for custom neural network - currently much slower than in PyTorch Machine Learning performance , differentiation	16	2170	August 28, 2021
State of reverse mode AD tools Numerics question	8	1140	March 5, 2019

Which autodiff to currently use for a neural network backend?

Related topics