Hi all,
I’m pleased to announce that the first version of ReverseDiff is now registered. ReverseDiff is brought to you by the same folks who develop and maintain the ForwardDiff package.
ReverseDiff implements methods to take gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions (or any callable object, really) using reverse mode automatic differentiation (AD). While performance can vary depending on the functions you evaluate, the algorithms implemented by ReverseDiff generally outperform non-AD algorithms in both speed and accuracy.
Why use ReverseDiff as your go-to reverse-mode AD package? Here are some reasons:
- supports a large subset of the Julia language, including loops, recursion, and control flow
- user-friendly API for reusing and compiling tapes
- user-friendly performance annotations such as
@forward
and@skip
(with more to come!) - compatible with ForwardDiff, enabling mixed-mode AD
- built-in definitions leverage the benefits of ForwardDiff’s
Dual
numbers (e.g. SIMD, zero-overhead arithmetic) - a familiar differentiation API for ForwardDiff users
- non-allocating linear algebra optimizations
- nested differentiation
- suitable as an execution backend for graphical machine learning libraries
- ReverseDiff doesn’t need to record scalar indexing operations (a huge cost for many similar libraries)
- higher-order
map
andbroadcast
optimizations - it’s well tested
In the future, we aim to add GPU support, sparsity exploitation, and more optimized linear algebra derivatives to the package.
Compared to ForwardDiff, ReverseDiff’s methods are algorithmically more efficient for differentiating functions where the input dimension is larger than the output dimension. In general, the optimal choice between ForwardDiff and ReverseDiff for a given problem requires some nuance, so I’ve written up some tips to guide your choice.
Internally, ReverseDiff contains facilities for recording execution traces of native Julia code to reusable, compilable instruction tapes, as well as mechanisms for propagating values “forwards” and “backwards” through these tapes. Since these tapes can be analyzed as computation graphs, my hope is that this infrastructure can eventually be rendered useful for non-AD purposes, such as performance optimization, scheduled parallel execution, and constraint programming. Feel free to reach out if you’re interested in exploring this area!