Though it has some constraints, source-rewrite autodiff is appealing when it can apply, because of the potential code efficiency. Especially for reverse-mode autodiff, the most application is to evaluate gradients, which are usually computed lots of times in optimization or sampling algorithms.
ReverseDiffSource seems like a great fit for a lot of applications, but it hasn’t been updated in a year, and doesn’t seem to work at all on Julia v0.6. I opened an issue for this here last month, and haven’t heard anything.
I forked the repo here and connected FemtoCleaner, but the pull request it sent only updated one file, and it still didn’t work. I tried to fix a few more things with no luck.
Is ReverseDiffSource dead? Is there something I should try in its place?
I don’t know about the status of that particular package, but I do know that FemtoCleaner won’t run until you update your REQUIRE file to make julia 0.6 the minimum required version. That’s because its updates aren’t compatible with v0.5 syntax, so it can’t run on repos that still claim to support v0.5.
But, that said, FemtoCleaner is supposed to help fix future deprecations (that is, deprecations from v0.6 to v0.7/1.0), so it won’t generally fix things that just plain don’t work on v0.6.
Can you give any examples of what problems you’re running into?
Ok, I updated the REQUIRE, and Femtobot made another pull request (pushed to my fork of the repo). But I’m still getting this error when I load the package:
ERROR: LoadError: LoadError: UndefVarError: TypeConstructor not defined
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:569
[2] include(::String) at ./sysimg.jl:14
[3] include_from_node1(::String) at ./loading.jl:569
[4] include(::String) at ./sysimg.jl:14
while loading /home/chad/git/ReverseDiffSource.jl/src/zeronode.jl, in expression starting on line 23
while loading /home/chad/git/ReverseDiffSource.jl/src/ReverseDiffSource.jl, in expression starting on line 47
Take a look at XGrad.jl. I started it as a redesign of ReverseDiffSource.jl with slightly different requirements (e.g. I wasn’t interested in higher-order derivatives, but wanted to automatically differentiate nested function calls), but all in all they should be similar. Currently I test it for machine learning tasks (e.g. see VariationalAE.jl or Milk.jl) and will be thankful for any feedback.
Oh nice, XGrad does look pretty interesting. I’m thinking of ML applications as well. Do you have any benchmarks against alternatives like ReverseDiff or ForwardDiff? I know reverse is algorithmically better for gradients, but many Julia projects seem to prefer ForwardDiff for some reason. Oh, and there’s also AutoGrad that used for Knet.
For ML tasks with thousands and millions of inputs and a single output (e.g. loss) forward-mode AD is terribly slow, but there are many other tasks for which it shines.
There are 2 sets of benchmarks for XGrad - for CPU (XGrad vs. ReverseDiff) and for GPU (Arrays vs CuArrays).
Note, that ReverseDiff.jl has several tricks described here that I wasn’t aware of when writing benchmarks (note, that thread is about XDiff.jl - a previous incarnation of XGrad.jl, so don’t be confused with differences). All in all, XGrad and ReverseDiff both apply a number of optimizations and should have very similar performance. If you see some inefficient part in XGrad or high memory footprint, please report.
In practice I always try to use CuArrays when possible, since they give ~10 times improvement on my machine. E.g.:
Compiling derivatives for CPU
0.269616 seconds (290.32 k allocations: 36.576 MiB, 1.29% gc time)
Testing on CPU...
BenchmarkTools.Trial:
memory estimate: 1.15 MiB
allocs estimate: 67
--------------
minimum time: 16.013 ms (0.00% GC)
median time: 20.134 ms (0.00% GC)
mean time: 22.887 ms (0.28% GC)
maximum time: 80.884 ms (0.00% GC)
--------------
samples: 219
evals/sample: 1
Compiling derivatives for GPU
0.264454 seconds (407.77 k allocations: 27.281 MiB, 23.07% gc time)
Testing on GPU...
BenchmarkTools.Trial:
memory estimate: 408.38 KiB
allocs estimate: 611
--------------
minimum time: 745.951 μs (0.00% GC)
median time: 1.922 ms (38.08% GC)
mean time: 1.973 ms (38.86% GC)
maximum time: 4.364 ms (25.42% GC)
--------------
samples: 2529
evals/sample: 1
Well, three things there. First of all, reverse isn’t necessarily algorithmically better when it’s not gradients. For example, small Jacobians and Hessians are great with forward-mode autodiff. So ForwardDiff was able to proliferate because it’s widely applicable and has less constraints: no need to build graphs, just shoot some Dual numbers through your function. Secondly, ReverseDiff is much newer than ForwardDiff, so there’s some inertia but also ForwardDiff is just more battle-tested and at this point super reliable so yeah, inertia. ForwardDiff.jl is just a super great library! Lastly, there’s a lot of talk about this whole Cassette.jl-based ReverseDiff which can potentially change it quite drastically, so that adds a “I’ll wait an see” kind of feeling. So because of history and some potential future breaking, many projects which could make good use of reverse mode autodiff just haven’t switched yet.
You might also compare AutoGrad, which is easy to use and has been fast for the examples I’ve looked at. I know he’s working on CuArray support as well.
Also, I got ReverseDiffSource working julia Julia v0.6 in my forked repo here. I don’t understand much of the internals, but maybe it could be useful for comparison benchmarking.
Good benchmarks are hard - results depend on a function in question, input sizes, current state of CPU and memory, etc. Below are some results just to give you general impression, but it’s best to test it yourself. Also, suggestions for improving benchmark code is welcome.
# XGrad
--------------
minimum time: 1.778 s (0.00% GC)
median time: 1.956 s (0.05% GC)
mean time: 1.899 s (0.03% GC)
maximum time: 1.963 s (0.05% GC)
--------------
# ReverseDiff
--------------
minimum time: 2.856 s (0.00% GC)
median time: 2.862 s (0.00% GC)
mean time: 2.862 s (0.00% GC)
maximum time: 2.867 s (0.00% GC)
--------------
# AutoGrad
--------------
minimum time: 1.931 s (6.28% GC)
median time: 2.000 s (6.07% GC)
mean time: 2.000 s (6.65% GC)
maximum time: 2.068 s (4.24% GC)
--------------
As for ReverseDiffSource, I couldn’t make it working from your fork, even for examples in the README.