Does there exist any benchmark results indicating the performance of ForwardDiff.jl
compared to overloading approaches in C++ (i.e. FADBAD++) ?
I am not aware of any direct benchmarks but I know ForwardDiff.jl should be very competitive because of the following reasons.
 Purely stack allocated dual numbers.
 Specialization of all functions being called for the number of partials (constant from compilerâ€™s pov)
 Simd used in the computations involving partials (when starting julia with
O3
)
Youâ€™re talking about the implementation, not the algorithm. There are lots of ways to efficiently implement a slow algorithm
/rant over
That is not to say ForwardDiffâ€™s algorithm is slow, I have no idea whether it is or not, just that you need to compare both the algorithm and the implementation.
Could you please tell me about the different forward mode automatic differentiation algorithms using operator overloading?
Because if there is only one that is used in practice, so one can talk about the algorithm, your rant would look silly indeed.
Certainly:

A big difference is perturbation confusion. This can be a big performance problem, particularly for operator overloading implementations of AD because it might require introducing a lot of conditional logic.

Another difference is support for higherorder derivatives (a particularly fun test case is derivatives of trigonometric functions, because the derivatives repeat. Some implementations can figure this out and some canâ€™t)

Some packages support parallel constructs (OpenMP/MPI), some donâ€™t.

Dealing with sparsity is another factor (for computing entire Jacobians/Hessians).

Support for computing derivative wrt multiple variables is a single pass (sometimes called â€śvector modeâ€ť). This is helps amortize the cost of the the primal evaluation.
For example, ADOLC supports 2, 3, and 4. Iâ€™m pretty sure it can do 5 based on works published by the authors as well. I canâ€™t find any mention of 1, so I suspect it doesnâ€™t handle it (although I havenâ€™t read the full users manual).
edit: added 5
Responding to these points for ForwardDiff:

Yes, ForwardDiff protects against perturbation confusion. All perturbation confusion logic is computed at compile time, so no runtime cost is incurred.

Yes, ForwardDiff supports arbitrary nested differentiation. Depending on your use case (e.g. computing extremely highorder derivatives) TaylorSeries.jl may be easier to use and faster (at the cost of memory usage).

This depends on what you mean. You can definitely use ForwardDiffâ€™s dual numbers with Juliaâ€™s existing parallel constructs, but no work has been done towards supporting specific use cases (or nonJulia parallel constructs like MPI/OpenMP).

Sparsity exploitation isnâ€™t really in the scope of ForwardDiff. ForwardDiff purposefully doesnâ€™t include any computational graph framework, which is a requirement of most modern sparsity exploitation algorithms. ForwardDiff works well as a dependency of other sparse AD tools (like JuMPâ€™s ReverseDiffSparse), where it can be used to efficiently compute Jacobian/Hessianvector products.

Yes, ForwardDiff supports this. Furthermore, vectormode is a subset of ForwardDiffâ€™s â€śchunkmodeâ€ť, which can be tuned to make better use of memory bandwidth than traditional vectormode.