Fate of ReverseDiffSource

cscherrer · December 13, 2017, 5:53pm

Though it has some constraints, source-rewrite autodiff is appealing when it can apply, because of the potential code efficiency. Especially for reverse-mode autodiff, the most application is to evaluate gradients, which are usually computed lots of times in optimization or sampling algorithms.

ReverseDiffSource seems like a great fit for a lot of applications, but it hasn’t been updated in a year, and doesn’t seem to work at all on Julia v0.6. I opened an issue for this here last month, and haven’t heard anything.

I forked the repo here and connected FemtoCleaner, but the pull request it sent only updated one file, and it still didn’t work. I tried to fix a few more things with no luck.

Is ReverseDiffSource dead? Is there something I should try in its place?

rdeits · December 13, 2017, 6:38pm

I don’t know about the status of that particular package, but I do know that FemtoCleaner won’t run until you update your REQUIRE file to make julia 0.6 the minimum required version. That’s because its updates aren’t compatible with v0.5 syntax, so it can’t run on repos that still claim to support v0.5.

rdeits · December 13, 2017, 6:41pm

But, that said, FemtoCleaner is supposed to help fix future deprecations (that is, deprecations from v0.6 to v0.7/1.0), so it won’t generally fix things that just plain don’t work on v0.6.

Can you give any examples of what problems you’re running into?

cscherrer · December 13, 2017, 9:35pm

Ok, I updated the REQUIRE, and Femtobot made another pull request (pushed to my fork of the repo). But I’m still getting this error when I load the package:

ERROR: LoadError: LoadError: UndefVarError: TypeConstructor not defined
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:569
 [2] include(::String) at ./sysimg.jl:14
 [3] include_from_node1(::String) at ./loading.jl:569
 [4] include(::String) at ./sysimg.jl:14
while loading /home/chad/git/ReverseDiffSource.jl/src/zeronode.jl, in expression starting on line 23
while loading /home/chad/git/ReverseDiffSource.jl/src/ReverseDiffSource.jl, in expression starting on line 47

rdeits · December 13, 2017, 10:06pm

Interesting. It looks like TypeConstructor became UnionAll in https://github.com/JuliaLang/julia/pull/18457

dfdx · December 13, 2017, 10:33pm

Take a look at XGrad.jl. I started it as a redesign of ReverseDiffSource.jl with slightly different requirements (e.g. I wasn’t interested in higher-order derivatives, but wanted to automatically differentiate nested function calls), but all in all they should be similar. Currently I test it for machine learning tasks (e.g. see VariationalAE.jl or Milk.jl) and will be thankful for any feedback.

cscherrer · December 14, 2017, 3:10am

Hmm, maybe I could just switch them out? I had looked around for documentation of TypeConstructor and hadn’t had any luck.

cscherrer · December 14, 2017, 3:17am

Oh nice, XGrad does look pretty interesting. I’m thinking of ML applications as well. Do you have any benchmarks against alternatives like ReverseDiff or ForwardDiff? I know reverse is algorithmically better for gradients, but many Julia projects seem to prefer ForwardDiff for some reason. Oh, and there’s also AutoGrad that used for Knet.

dfdx · December 14, 2017, 8:35am

For ML tasks with thousands and millions of inputs and a single output (e.g. loss) forward-mode AD is terribly slow, but there are many other tasks for which it shines.

There are 2 sets of benchmarks for XGrad - for CPU (XGrad vs. ReverseDiff) and for GPU (Arrays vs CuArrays).

Note, that ReverseDiff.jl has several tricks described here that I wasn’t aware of when writing benchmarks (note, that thread is about XDiff.jl - a previous incarnation of XGrad.jl, so don’t be confused with differences). All in all, XGrad and ReverseDiff both apply a number of optimizations and should have very similar performance. If you see some inefficient part in XGrad or high memory footprint, please report.

In practice I always try to use CuArrays when possible, since they give ~10 times improvement on my machine. E.g.:

Compiling derivatives for CPU
  0.269616 seconds (290.32 k allocations: 36.576 MiB, 1.29% gc time)
Testing on CPU...
BenchmarkTools.Trial: 
  memory estimate:  1.15 MiB
  allocs estimate:  67
  --------------
  minimum time:     16.013 ms (0.00% GC)
  median time:      20.134 ms (0.00% GC)
  mean time:        22.887 ms (0.28% GC)
  maximum time:     80.884 ms (0.00% GC)
  --------------
  samples:          219
  evals/sample:     1

Compiling derivatives for GPU
  0.264454 seconds (407.77 k allocations: 27.281 MiB, 23.07% gc time)
Testing on GPU...
BenchmarkTools.Trial: 
  memory estimate:  408.38 KiB
  allocs estimate:  611
  --------------
  minimum time:     745.951 μs (0.00% GC)
  median time:      1.922 ms (38.08% GC)
  mean time:        1.973 ms (38.86% GC)
  maximum time:     4.364 ms (25.42% GC)
  --------------
  samples:          2529
  evals/sample:     1

ChrisRackauckas · December 14, 2017, 1:56pm

Well, three things there. First of all, reverse isn’t necessarily algorithmically better when it’s not gradients. For example, small Jacobians and Hessians are great with forward-mode autodiff. So ForwardDiff was able to proliferate because it’s widely applicable and has less constraints: no need to build graphs, just shoot some Dual numbers through your function. Secondly, ReverseDiff is much newer than ForwardDiff, so there’s some inertia but also ForwardDiff is just more battle-tested and at this point super reliable so yeah, inertia. ForwardDiff.jl is just a super great library! Lastly, there’s a lot of talk about this whole Cassette.jl-based ReverseDiff which can potentially change it quite drastically, so that adds a “I’ll wait an see” kind of feeling. So because of history and some potential future breaking, many projects which could make good use of reverse mode autodiff just haven’t switched yet.

cscherrer · December 14, 2017, 11:14pm

Very nice, especially the GPU support!

You might also compare AutoGrad, which is easy to use and has been fast for the examples I’ve looked at. I know he’s working on CuArray support as well.

Also, I got ReverseDiffSource working julia Julia v0.6 in my forked repo here. I don’t understand much of the internals, but maybe it could be useful for comparison benchmarking.

cscherrer · December 14, 2017, 11:15pm

That’s helpful background, thank you @ChrisRackauckas

dfdx · December 15, 2017, 11:17am

Good benchmarks are hard - results depend on a function in question, input sizes, current state of CPU and memory, etc. Below are some results just to give you general impression, but it’s best to test it yourself. Also, suggestions for improving benchmark code is welcome.

  # XGrad
  --------------
  minimum time:     1.778 s (0.00% GC)
  median time:      1.956 s (0.05% GC)
  mean time:        1.899 s (0.03% GC)
  maximum time:     1.963 s (0.05% GC)
  --------------

  # ReverseDiff
  --------------
  minimum time:     2.856 s (0.00% GC)
  median time:      2.862 s (0.00% GC)
  mean time:        2.862 s (0.00% GC)
  maximum time:     2.867 s (0.00% GC)
  --------------

  # AutoGrad
  --------------
  minimum time:     1.931 s (6.28% GC)
  median time:      2.000 s (6.07% GC)
  mean time:        2.000 s (6.65% GC)
  maximum time:     2.068 s (4.24% GC)
  --------------

As for ReverseDiffSource, I couldn’t make it working from your fork, even for examples in the README.

Topic		Replies	Views
Using ReverseSourceDiff for a function with matrices New to Julia differentiation	10	806	June 23, 2018
ReverseDiff.jl Community package , announcement	9	1699	December 13, 2017
Automatic Differentiation Machine Learning	11	3291	February 11, 2019
Why is reversediff slow? General Usage question	11	1227	January 19, 2021
What is the right way to compute Gradient of function of 2 variables with Auto Differentiation? New to Julia package , gradient	6	10145	September 18, 2018

Fate of ReverseDiffSource

Related topics