ANN: XDiff.jl - an expression differentiation package

Cool stuff!

This generator replaces all matrix multiplications with Ax_mul_Bx!() family of functions, fuses element-wise operations and removes any unused code.

ReverseDiff also does these things, so I looked into why it was running so slowly for your benchmark.

It turns out that Julia’s new parser-level broadcast fusion was bypassing ReverseDiff’s primitives, so the broadcast operations were getting unrolled in ReverseDiff’s tape. I’m surprised ReverseDiff was even as quick as it was, given the crazy number of operations it was doing!

Future versions of ReverseDiff won’t have this problem - fused broadcasts will be intercepted automatically as their own primitives. The machinery for this optimization actually already exists, and is used for some of ReverseDiff’s opt-in mixed-mode optimizations. The only thing holding back the more general version of this optimization was a tagging system for ForwardDiff’s Dual numbers (since the optimization could’ve resulted in perturbation confusion). Now that Forward has such a system to prevent perturbation confusion, ReverseDiff’s mixed-mode broadcast optimization will no longer need to be explicitly opt-in; it’ll just happen automatically. I just need to get around to updating ReverseDiff to use the latest version of ForwardDiff.

In the meantime, I altered the code to avoid parser fusion so we could get a better idea of ReverseDiff’s actual performance.

Here are the “large data” XDiff results on my machine for benchmark_autoencoder (using your unmodified code):

Compiling derivatives using XDiff
 75.817737 seconds (17.12 M allocations: 7.705 GiB, 5.87% gc time)
Testing XDiff...
BenchmarkTools.Trial:
  memory estimate:  9.95 MiB
  allocs estimate:  153
  --------------
  minimum time:     567.973 ms (0.00% GC)
  median time:      654.563 ms (0.00% GC)
  mean time:        649.178 ms (0.20% GC)
  maximum time:     739.510 ms (0.64% GC)
  --------------
  samples:          8
  evals/sample:     1

Here are the ReverseDiff results for the same benchmark, but using this gist:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     611.937 ms (0.00% GC)
  median time:      622.834 ms (0.00% GC)
  mean time:        627.895 ms (0.00% GC)
  maximum time:     670.710 ms (0.00% GC)
  --------------
  samples:          8
  evals/sample:     1
3 Likes