Improve speed of vector-jacobian product

Reverse-mode differentiation may not understand that each iteration of this loop updates a different element of result independently — I’m guessing that it makes a copy of the result array for every iteration of the loop, before running the loop backwards in order to backpropagate the vjp.

Moreover, you are effectively doing a sparse matrix–vector multiplication in which the elements of the matrix depend on your parameters, and AD tools often struggle with backpropagating through a sparse-matrix construction for reasons I explained in this thread: Zygote.jl: How to get the gradient of sparse matrix - #6 by stevengj

I’ve typically found that you need to write a manual vJp for some step(s) in most reasonable-scale scientific problems, unless you are using a package like DiffEqFlux.jl that has done that for you. AD tools are more reliable for cookie-cutter ML-style problems where you are plugging together large components that it already knows about, and are only fiddling around the edges with small scalar functions or code written in a functional/non-mutating style (e.g. composing vectorized primitives).

3 Likes