ReverseDiff.jl

jrevels · December 6, 2016, 3:53pm

Hi all,

I’m pleased to announce that the first version of ReverseDiff is now registered. ReverseDiff is brought to you by the same folks who develop and maintain the ForwardDiff package.

ReverseDiff implements methods to take gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions (or any callable object, really) using reverse mode automatic differentiation (AD). While performance can vary depending on the functions you evaluate, the algorithms implemented by ReverseDiff generally outperform non-AD algorithms in both speed and accuracy.

Why use ReverseDiff as your go-to reverse-mode AD package? Here are some reasons:

supports a large subset of the Julia language, including loops, recursion, and control flow
user-friendly API for reusing and compiling tapes
user-friendly performance annotations such as @forward and @skip (with more to come!)
compatible with ForwardDiff, enabling mixed-mode AD
built-in definitions leverage the benefits of ForwardDiff’s Dual numbers (e.g. SIMD, zero-overhead arithmetic)
a familiar differentiation API for ForwardDiff users
non-allocating linear algebra optimizations
nested differentiation
suitable as an execution backend for graphical machine learning libraries
ReverseDiff doesn’t need to record scalar indexing operations (a huge cost for many similar libraries)
higher-order map and broadcast optimizations
it’s well tested

In the future, we aim to add GPU support, sparsity exploitation, and more optimized linear algebra derivatives to the package.

Compared to ForwardDiff, ReverseDiff’s methods are algorithmically more efficient for differentiating functions where the input dimension is larger than the output dimension. In general, the optimal choice between ForwardDiff and ReverseDiff for a given problem requires some nuance, so I’ve written up some tips to guide your choice.

Internally, ReverseDiff contains facilities for recording execution traces of native Julia code to reusable, compilable instruction tapes, as well as mechanisms for propagating values “forwards” and “backwards” through these tapes. Since these tapes can be analyzed as computation graphs, my hope is that this infrastructure can eventually be rendered useful for non-AD purposes, such as performance optimization, scheduled parallel execution, and constraint programming. Feel free to reach out if you’re interested in exploring this area!

ahwillia · December 6, 2016, 4:56pm

Congrats @jrevels - This is exciting!

rdeits · December 6, 2016, 5:45pm

This is awesome! ForwardDiff.jl is already one of my favorite parts of the Julia ecosystem, and I’m thrilled to be able to try out reverse-mode just as easily.

MikeInnes · December 6, 2016, 6:46pm

Congrats on the announcement @jrevels! This is very cool, the benchmark results are extremely impressive, and I like the possibilities of the instruction tape work. Can’t wait to see what people build on top of this.

cortner · December 6, 2016, 9:23pm

when trying to add the package I get

ERROR: unsatisfiable package requirements detected: no feasible version could be found for package: DiffBase

any suggestion what might have gone wrong?

jrevels · December 6, 2016, 9:46pm

Here’s ReverseDiff’s REQUIRE file; it seems that you have a package somewhere that conflicts with these version requirements?

For example, ReverseDiff requires ForwardDiff 0.3.2 or higher, but a lot of popular packages (like JuMP and Optim) haven’t made the switch over yet (they’ll be switched over soon, though).

I’m not really sure why you’d see that error with DiffBase, though - the only packages I know of that use it currently are ForwardDiff and ReverseDiff, and neither of them have upper bounds on it.

jmxpearson · December 6, 2016, 10:20pm

Hooray! For those who haven’t looked at the code, it’s really impressive how much @jrevels has managed to generalize and optimize the algorithm to support a broad base of Julia functionality.

This functionality should be really helpful to those implementing Julia machine learning algorithms going forward.

cscherrer · December 13, 2017, 5:53pm

“Since these tapes can be analyzed as computation graphs, my hope is that this infrastructure can eventually be rendered useful for non-AD purposes, such as performance optimization, scheduled parallel execution, and constraint programming.”

Oh, that’s really interesting, @jrevels. A lot of probabilistic programming is in terms of “execution traces”, usually expressed in terms of coroutines or continuations. Computation of gradients is also really common in that context; maybe there’s a way to combine the two?

rdeits · December 13, 2017, 6:36pm

@cscherrer that sounds a lot like what Cassette.jl is aiming to do.

cscherrer · December 13, 2017, 9:18pm

Oh interesting, I hadn’t seen Cassette.

Topic		Replies	Views
Why is reversediff slow? General Usage question	11	1227	January 19, 2021
Automatic Differentiation Machine Learning	11	3291	February 11, 2019
ReverseDiff.GradientTapes for functions of the form x -> f(x, y) when gradient is to be calculated wrt x only? General Usage	2	807	November 4, 2019
Correct Way to Compose ReverseDiff and ForwardDiff Optimization (Mathematical) forwarddiff , reversediff , dual	8	1084	July 4, 2022
Taking gradients in Julia General Usage question , zygote , forwarddiff , reversediff	7	2129	September 28, 2021

ReverseDiff.jl

Related topics