How to think about the cost of vJ and Jv (pullback and pushforward)

In a calculation involving an f: R^n \to R^m function, where n ranges from 50 to a few hundred and m \ge n with a similar scale, I have the option of either

  1. calculating the Jacobian J once and saving it, or

  2. calculating Jv, Jw, uJ for three different vectors v, w, u.

I can (and will) benchmark this, but I am wondering if there is a heuristic way to think about the cost, eg if calculating J has O(nm) cost, can we say something similar about the others?

And does this depend on the AD mode and the interface?

(I will be using the excellent DifferentiationInterface.jl so I can experiment freely)

3 Likes

First of all, a few questions:

  • Is the Jacobian sparse?
  • How many times are you performing each operation? Does everything vary at each time, or are you in a setting where you have a single x, which determines J_f(x), and many different triplets (u, v, w)?

Say your function f costs \tau to compute on one input x. Roughly speaking:

  • A single JVP (forward mode) or VJP (reverse mode) can be computed in time O(\tau)
  • A dense forward-mode Jacobian can be computed in time O(n \tau) (one JVP per basis vector of the input space)
  • A dense reverse-mode Jacobian can be computed in time O(m \tau) (one VJP per basis vector of the output space)
  • The memory use of reverse mode is much higher, which implies that the multiplicative coefficient inside the O is bigger. In your case, you should probably use forward mode to compute the full Jacobian.

So, as far as I understand your question, it will usually be faster to compute (Jv, Jw, uJ) than the full Jacobian. But if the Jacobian is sparse or if you reuse the same J for multiple triplets (u, v, w), the answer may differ.

1 Like

Thanks for the clarifying questions, I should have thought about them.

  1. the Jacobian is not sparse, it is more than 70% nonzeros most of the time,
  2. the point at which the Jacobian is evaluated is different for each triplet, so a different J every time.

What are the recommended forward-mode backends for vJ and Jv at this point besides ForwardDiff.jl?

What does your function look like? Does it have mutation? Is it type-generic?

At first glance, you should evaluate Jv with a forward-mode backend like ForwardDiff.jl or Enzyme.jl, and vJ with a reverse-mode backend like Enzyme.jl or Mooncake.jl.

1 Like

Yes, it has mutation and some allocations too. It is type stable and works with ForwardDiff.Dual.

Then my advice holds, and DI will allow for easy benchmarking.

1 Like

You may also want to check out DifferentiationInterfaceTest.jl for benchmarking

2 Likes