How to think about the cost of vJ and Jv (pullback and pushforward)

Tamas_Papp · May 15, 2025, 7:44am

In a calculation involving an f: R^n \to R^m function, where n ranges from 50 to a few hundred and m \ge n with a similar scale, I have the option of either

calculating the Jacobian J once and saving it, or
calculating Jv, Jw, uJ for three different vectors v, w, u.

I can (and will) benchmark this, but I am wondering if there is a heuristic way to think about the cost, eg if calculating J has O(nm) cost, can we say something similar about the others?

And does this depend on the AD mode and the interface?

(I will be using the excellent DifferentiationInterface.jl so I can experiment freely)

gdalle · May 15, 2025, 8:07am

First of all, a few questions:

Is the Jacobian sparse?
How many times are you performing each operation? Does everything vary at each time, or are you in a setting where you have a single x, which determines J_f(x), and many different triplets (u, v, w)?

Say your function f costs \tau to compute on one input x. Roughly speaking:

A single JVP (forward mode) or VJP (reverse mode) can be computed in time O(\tau)
A dense forward-mode Jacobian can be computed in time O(n \tau) (one JVP per basis vector of the input space)
A dense reverse-mode Jacobian can be computed in time O(m \tau) (one VJP per basis vector of the output space)
The memory use of reverse mode is much higher, which implies that the multiplicative coefficient inside the O is bigger. In your case, you should probably use forward mode to compute the full Jacobian.

So, as far as I understand your question, it will usually be faster to compute (Jv, Jw, uJ) than the full Jacobian. But if the Jacobian is sparse or if you reuse the same J for multiple triplets (u, v, w), the answer may differ.

Tamas_Papp · May 15, 2025, 8:43am

Thanks for the clarifying questions, I should have thought about them.

the Jacobian is not sparse, it is more than 70% nonzeros most of the time,
the point at which the Jacobian is evaluated is different for each triplet, so a different J every time.

What are the recommended forward-mode backends for vJ and Jv at this point besides ForwardDiff.jl?

gdalle · May 15, 2025, 8:52am

What does your function look like? Does it have mutation? Is it type-generic?

At first glance, you should evaluate Jv with a forward-mode backend like ForwardDiff.jl or Enzyme.jl, and vJ with a reverse-mode backend like Enzyme.jl or Mooncake.jl.

Tamas_Papp · May 15, 2025, 8:57am

Yes, it has mutation and some allocations too. It is type stable and works with ForwardDiff.Dual.

gdalle · May 15, 2025, 8:58am

Then my advice holds, and DI will allow for easy benchmarking.

gdalle · May 15, 2025, 8:58am

You may also want to check out DifferentiationInterfaceTest.jl for benchmarking

Topic		Replies	Views
Efficient Vector^T * Jacobian matrix? Numerics	7	3493	September 11, 2024
Improve speed of vector-jacobian product Performance reversediff , autodiff	2	504	November 16, 2023
Efficient pushforward (Jacobian vector product) with multiple vectors using automatic differentiation Performance autodiff	13	545	May 27, 2023
How to optimize Jacobian ForwardDiff.jl Modelling & Simulations forwarddiff	2	417	March 20, 2023
ForwardDiff.jl: Compute J_f(x) * f(x) General Usage question	2	236	June 1, 2023

How to think about the cost of vJ and Jv (pullback and pushforward)

Related topics