Help with Automatic Differentiation of Stiffness Matrix Function

This form is equivalent to a vector–Jacobian product, which is what backpropagation (reverse mode) does for you.

I’ve used expressions of the form \lambda^T \frac{\partial K}{\partial \zeta} u many times because they show up in adjoint methods for PDE solves (see also my course notes on adjoint methods and chapter 6 of our matrix calculus course notes), and my group does a lot of PDE-constrained shape and topology optimization. If you implement this explicitly, by separately computing \lambda and v yourself, it is absolutely critical to exploit the structure of \frac{\partial K}{\partial \zeta} — in particular, for finite-element methods, \frac{\partial K}{\partial \zeta} is usually extremely sparse (even more sparse than K: I’m talking O(1) nonzeros), since most parameters affect only a few elements — this lets it scale to a vast number of parameters. If you just throw AD at the problem blindly, you run the danger of computing a huge matrix (for each \zeta!) that is mostly zero, where you computed and stored many zeros explicitly.

It is still possible to apply AD, but you need to do so carefully, e.g. apply forward-mode AD to the weak-form integrand of your FEM problem to get the derivative of the weak form, and then plug this into your matrix-assembly routines to do all of the \lambda^T K' u products at once (similar to this tutorial). Or apply reverse-mode to the whole problem at once rather than doing manual adjoint solves (this assumes that reverse-mode AD can differentiate effectively through your whole problem, which is unfortunately often still not possible for PDE solvers).

Basically, you have to think about how AD works and think about scalings and structure (e.g. sparsity) if you want to use it effectively in a large problem, especially if you are doing part of the differentiation manually (e.g. manual adjoint solves).

3 Likes