Are there any results comparing ForwardDiff to an operator overloading approach in C++?

question

#1

Does there exist any benchmark results indicating the performance of ForwardDiff.jl compared to overloading approaches in C++ (i.e. FADBAD++) ?


#2

I am not aware of any direct benchmarks but I know ForwardDiff.jl should be very competitive because of the following reasons.

  • Purely stack allocated dual numbers.
  • Specialization of all functions being called for the number of partials (constant from compiler’s pov)
  • Simd used in the computations involving partials (when starting julia with O3)

#3

You’re talking about the implementation, not the algorithm. There are lots of ways to efficiently implement a slow algorithm

/rant over

That is not to say ForwardDiff’s algorithm is slow, I have no idea whether it is or not, just that you need to compare both the algorithm and the implementation.


#4

Could you please tell me about the different forward mode automatic differentiation algorithms using operator overloading?

Because if there is only one that is used in practice, so one can talk about the algorithm, your rant would look silly indeed.


#5

Certainly:

  1. A big difference is perturbation confusion. This can be a big performance problem, particularly for operator overloading implementations of AD because it might require introducing a lot of conditional logic.

  2. Another difference is support for higher-order derivatives (a particularly fun test case is derivatives of trigonometric functions, because the derivatives repeat. Some implementations can figure this out and some can’t)

  3. Some packages support parallel constructs (OpenMP/MPI), some don’t.

  4. Dealing with sparsity is another factor (for computing entire Jacobians/Hessians).

  5. Support for computing derivative wrt multiple variables is a single pass (sometimes called “vector mode”). This is helps amortize the cost of the the primal evaluation.

For example, ADOL-C supports 2, 3, and 4. I’m pretty sure it can do 5 based on works published by the authors as well. I can’t find any mention of 1, so I suspect it doesn’t handle it (although I haven’t read the full users manual).

edit: added 5


#6

Responding to these points for ForwardDiff:

  1. Yes, ForwardDiff protects against perturbation confusion. All perturbation confusion logic is computed at compile time, so no runtime cost is incurred.

  2. Yes, ForwardDiff supports arbitrary nested differentiation. Depending on your use case (e.g. computing extremely high-order derivatives) TaylorSeries.jl may be easier to use and faster (at the cost of memory usage).

  3. This depends on what you mean. You can definitely use ForwardDiff’s dual numbers with Julia’s existing parallel constructs, but no work has been done towards supporting specific use cases (or non-Julia parallel constructs like MPI/OpenMP).

  4. Sparsity exploitation isn’t really in the scope of ForwardDiff. ForwardDiff purposefully doesn’t include any computational graph framework, which is a requirement of most modern sparsity exploitation algorithms. ForwardDiff works well as a dependency of other sparse AD tools (like JuMP’s ReverseDiffSparse), where it can be used to efficiently compute Jacobian/Hessian-vector products.

  5. Yes, ForwardDiff supports this. Furthermore, vector-mode is a subset of ForwardDiff’s “chunk-mode”, which can be tuned to make better use of memory bandwidth than traditional vector-mode.