Okay, so this is why you think you need the full matrix inverse — you are computing a log determinant? There are lots of methods for estimating the log determinant, e.g. by stochastic Lanczos quadrature + Hutchinson trace estimators. Moreover, since Hutchinson trace estimators are precisely in the form of an expected-value calculation, they are compatible with stochastic gradient descent (so you don’t need to compute the exact log determinant at each step in order to minimize it). These are iterative methods, so you only need matrix–vector products, which is useful if your matrix is large and sparse (or structured).
Alternatively, if you have the explicit Cholesky factorization C
of a matrix A, you can compute logdet(A) or logdet(A⁻¹)=-logdet(A) directly from the Cholesky factors (just call logdet(C)
)
I would definitely recommend a single minimization rather than two nested minimizations if possible.