In a calculation involving an f: R^n \to R^m function, where n ranges from 50 to a few hundred and m \ge n with a similar scale, I have the option of either
-
calculating the Jacobian J once and saving it, or
-
calculating Jv, Jw, uJ for three different vectors v, w, u.
I can (and will) benchmark this, but I am wondering if there is a heuristic way to think about the cost, eg if calculating J has O(nm) cost, can we say something similar about the others?
And does this depend on the AD mode and the interface?
(I will be using the excellent DifferentiationInterface.jl so I can experiment freely)
First of all, a few questions:
- Is the Jacobian sparse?
- How many times are you performing each operation? Does everything vary at each time, or are you in a setting where you have a single x, which determines J_f(x), and many different triplets (u, v, w)?
Say your function f costs \tau to compute on one input x. Roughly speaking:
- A single JVP (forward mode) or VJP (reverse mode) can be computed in time O(\tau)
- A dense forward-mode Jacobian can be computed in time O(n \tau) (one JVP per basis vector of the input space)
- A dense reverse-mode Jacobian can be computed in time O(m \tau) (one VJP per basis vector of the output space)
- The memory use of reverse mode is much higher, which implies that the multiplicative coefficient inside the O is bigger. In your case, you should probably use forward mode to compute the full Jacobian.
So, as far as I understand your question, it will usually be faster to compute (Jv, Jw, uJ) than the full Jacobian. But if the Jacobian is sparse or if you reuse the same J for multiple triplets (u, v, w), the answer may differ.
Thanks for the clarifying questions, I should have thought about them.
- the Jacobian is not sparse, it is more than 70% nonzeros most of the time,
- the point at which the Jacobian is evaluated is different for each triplet, so a different J every time.
What are the recommended forward-mode backends for vJ and Jv at this point besides ForwardDiff.jl?
What does your function look like? Does it have mutation? Is it type-generic?
At first glance, you should evaluate Jv with a forward-mode backend like ForwardDiff.jl or Enzyme.jl, and vJ with a reverse-mode backend like Enzyme.jl or Mooncake.jl.
Yes, it has mutation and some allocations too. It is type stable and works with ForwardDiff.Dual.
Then my advice holds, and DI will allow for easy benchmarking.
You may also want to check out DifferentiationInterfaceTest.jl for benchmarking