Sparse matrix error with forward diff

The “gradient of a matrix” is not a thing, as I understand the gradient; this is not how the chain rule works with matrices. (ForwardDiff agrees with me: it gives a DimensionMismatch: gradient(f, x) expects that f(x) is a real number. if you try to compute ForwardDiff.gradient(x -> M(x), x) … your code just didn’t get that far.)

A more explicit version that is correct is:

\frac{\partial \log \det M(x)}{\partial x_k} = \mathrm{trace} \left[ M^{-1} \frac{\partial M}{\partial x_k} \right]

Yes, this still involves M^{-1} applied to the derivative of M, which will in general be dense. But there are various things that you can do depending on the structure of M. (e.g. if \partial M / \partial x_k is sufficiently sparse, or perhaps low rank, you may only have to apply M \ ... to a few vectors. Or you can use iterative trace-estimation techniques. Or …) When matrices get big and sparse, you often need to think about the structure of your specific problem carefully (which is one of the difficulties with making AD fully automatic for large sparse-matrix calculations).

(I just ran into a 2019 preprint specifically on efficient computation of the gradient of the log–determinant for large, sparse matrices that may be worth looking over.)

But worse than exponential (factorial) complexity? If you are using sparse matrices, I’m assuming your matrices are huge. If you have small-ish matrices, you should use dense matrices anyway, and ChainRules.jl knows how to differentiate through log-determinants of dense matrices.

3 Likes