Are forward-mode derivatives calculated in parallel?

Bizzi · May 9, 2025, 10:43am

When I calculate the output of a function and a directional derivative using forward-mode autodiff (ForwardDiff, TaylorDiff etc), is the calculation of the derivatives carried out in parallel?

By parallel I mean: assuming both my function and its derivatives are gpu-friendly (i.e. a Neural Network), should calculating its derivatives along with the result make overall evaluation slower? Or could I hope for “free” derivatives given enough memory?

gdalle · May 9, 2025, 10:57am

Just to clarify, I think there are three ways to interpret your question:

If the function runs on the GPU (parallelizing over input), does the derivative run on the GPU too?
Assuming the derivative runs on the GPU (parallelizing over input), how much will one derivative slow down the primal program?
Assuming the derivative runs on the GPU (parallelizing over input), how many derivatives are computed simultaneously (parallelizing over directions/tangents)?

As far as ForwardDiff.jl is concerned:

Depends on the operator. If I remember correctly, ForwardDiff.derivative will run fine on GPU arrays, but ForwardDiff.gradient will fail due to scalar indexing.
In forward mode, autodiff theory tells us that evaluation shouldn’t be slowed down too much when derivatives are propagated alongside the primals… but that’s not true in practice. For instance, if the primal function takes an optimized code path for Matrix{Float64} (like a BLAS call), the derivative requires working with Matrix{Dual{Float64}}, which is much slower because it is pure Julia code.
This is determined by the so-called chunk size, which determines how many partials are stored in the derivative tuple.

Topic		Replies	Views
Are there any results comparing ForwardDiff to an operator overloading approach in C++? General Usage question	5	1015	September 6, 2017
Automatic Differentiation Slow (Slower than Finite-Differences) Optimization (Mathematical)	18	3463	May 31, 2018
Why can forwarddiff get gradient wrt. to multiple parameters in 1 function call? Numerics question	5	1605	September 3, 2017
Can I expect ForwardDiff to give the same performance in this case? Performance forwarddiff	5	562	September 8, 2021
Fast derivatives for tensor expressions using ForwardDiff Performance forwarddiff	1	534	April 2, 2020

Are forward-mode derivatives calculated in parallel?

Related topics