In a calculation involving an f: R^n \to R^m function, where n ranges from 50 to a few hundred and m \ge n with a similar scale, I have the option of either
-
calculating the Jacobian J once and saving it, or
-
calculating Jv, Jw, uJ for three different vectors v, w, u.
I can (and will) benchmark this, but I am wondering if there is a heuristic way to think about the cost, eg if calculating J has O(nm) cost, can we say something similar about the others?
And does this depend on the AD mode and the interface?
(I will be using the excellent DifferentiationInterface.jl so I can experiment freely)
3 Likes
First of all, a few questions:
- Is the Jacobian sparse?
- How many times are you performing each operation? Does everything vary at each time, or are you in a setting where you have a single x, which determines J_f(x), and many different triplets (u, v, w)?
Say your function f costs \tau to compute on one input x. Roughly speaking:
- A single JVP (forward mode) or VJP (reverse mode) can be computed in time O(\tau)
- A dense forward-mode Jacobian can be computed in time O(n \tau) (one JVP per basis vector of the input space)
- A dense reverse-mode Jacobian can be computed in time O(m \tau) (one VJP per basis vector of the output space)
- The memory use of reverse mode is much higher, which implies that the multiplicative coefficient inside the O is bigger. In your case, you should probably use forward mode to compute the full Jacobian.
So, as far as I understand your question, it will usually be faster to compute (Jv, Jw, uJ) than the full Jacobian. But if the Jacobian is sparse or if you reuse the same J for multiple triplets (u, v, w), the answer may differ.
1 Like
Thanks for the clarifying questions, I should have thought about them.
- the Jacobian is not sparse, it is more than 70% nonzeros most of the time,
- the point at which the Jacobian is evaluated is different for each triplet, so a different J every time.
What are the recommended forward-mode backends for vJ and Jv at this point besides ForwardDiff.jl?
What does your function look like? Does it have mutation? Is it type-generic?
At first glance, you should evaluate Jv with a forward-mode backend like ForwardDiff.jl or Enzyme.jl, and vJ with a reverse-mode backend like Enzyme.jl or Mooncake.jl.
1 Like
Yes, it has mutation and some allocations too. It is type stable and works with ForwardDiff.Dual
.
Then my advice holds, and DI will allow for easy benchmarking.
1 Like
You may also want to check out DifferentiationInterfaceTest.jl for benchmarking
2 Likes