What is the most efficient way to compute the inverse of a sparse matrix?

I have a sparse matrix Q that is a covariance matrix for a Gaussian Process. I need to invert Q for computations in Bayesian full-conditional distributions. For example, I need to compute terms like \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, where \mathbf{X}\boldsymbol{\beta} has the typical linear regression definition. I have found that inv(sparse(Q)) returns an error stating "The inverse of a sparse matrix can often be dense and can cause the computer to run out of memory..." The fastest solution I have to compute the inverse of Q is using inv(cholesky(Q)). However, this is still taking much longer than I’d like it to.

Throughout your Julia journeys, how have you handled inverting sparse matrices?

Rather than inverting, you almost always should use a linear solve. So, for example Q\X. If you want to do a bunch of solves, you can use c=cholesky(Q) and c\X.


So, if I want to compute \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, would I be able to use beta'*X'*Q\(beta'*X')? Or am I completely off?

As the warning indicates, sparse matrices seldom have sparse inverses. The best way to perform these sorts of calculations is with a factorization of your Q matrix, rather than an explicit inverse. Factorizations of sparse matrices often can be sparse, which is one reason why this is preferable. Since your Q is positive definite, you’ve already stumbled across the cholesky function. You can use this factorization object directly in calculations. For example, your \beta X^T Q^{-1} calculation can be done as cholQ = cholesky(Q) followed by β * X' / cholQ.


I see. What if I wanted to do a calculation such as (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'*Q^{-1}*(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})? Would I be able to use (y-X*beta)'*(y-X*beta)/cholQ, or am I violating some rules by bringing cholQ to the end?

Matrix multiplication isn’t commutative so you have to apply it in the right order, but other than that, this all should work.


Matrix multiplication is not commutative, so A * B != B * A except under specific circumstances that you should not usually count on. By extension, you cannot move the Q^{-1} to the end.

The correct way to write (y - X\beta)^\top Q^{-1} (y - X\beta) in code is ((y - X * beta)' / cholQ) * (y - X * beta) or (y - X * beta)' * (cholQ \ (y - X * beta)). Note that / and \ are used to invert the matrix on the right or left of a multiplication, respectively. In this case I would write this as

cholQ = cholesky(Q)
r = y - X * beta
z = r' * (cholQ \ r) # or (r' / cholQ) * r

to avoided computing y - X * beta twice, since the compiler will probably not be smart enough to see that redundancy on its own.


If you need to calculate a lot of times the value for different y-X \beta but fixed and already Cholesky factorize Q, then you could save some time by exploiting the symmetry of the factorization (one triangular backward substitution instead of two)

r = y - X * beta
w = cholQ.factors.PtL \ r

and of course you can try to save some allocations using in place calculations.

1 Like

The .factors.PtL property must be new in v1.9, or is otherwise unavailable in my version of v1.8 at least (.factors is simply returning the raw underlying storage and has no special properties). In the meantime, if one is using a non-pivoted Cholesky decomposition one can use w = cholQ.L \ r. If pivoted, one can use w = cholQ.L \ r[cholQ.p].

1 Like

Sorry, I was copying a piece of code on a version on which we added a layer.

w = cholQ.PtL \ r

should work

1 Like

Is there any way that I can use Cholesky factorizations to reduce the amount of computation (RAM) needed to compute rQr'? Right now, my r is 576x165600, so, w is 576x165600, and my 64GB of RAM is still not enough to compute z=w'*w.

Maybe @mikmoore would have an idea, too.

I hadn’t realized that r (and y and beta) were matrices and not vectors.

The issue is that you’re trying to compute a 165600x165600 matrix, which has 27 billion elements. For a dense matrix at Float64 precision this is 220GB. If w is sparse, you can call sparse(w) to make it sparse again and then w'*w will be sparse and might be much smaller.

But I’m not sure I would expect w to actually be sparse. Further, you’re computing a big matrix which is 165600x165600 but is only rank 576 (or less), which is almost always the wrong thing to do even if it is sparse.

I would try to refactor your following calculations to exploit this. For example, instead of computing (w' * w) * v, which will fail on w' * w, compute w' * (w * v) which will succeed unless v has a large number of columns.

1 Like

Can you clarify ? Is it r' Q^{1} r or r Q^{-1} r' you calculate, or in other words what is the size of Q.

Also, what do you want to use z for ? If you multiply it with a vector for example, z* v you could do (w'*(w*v)) and save memory


Surprisingly, w is sparse, but I agree that computing the big matrix seems wrong.

Are you suggesting that v is a variable that I already mentioned, or was that just for an example?

Just an example. It is rare that one is interested in a huge matrix z on itself. Generally you want to do something with it and the suggestion is to look how to organize the following calculations, exemplified here with a multiplication by a vector.


I am trying to compute r'Q^{-1}r, so Q is 576x576.

Ultimately, I am going to use z for a symmetric Woodbury inverse. So for reference, r=(A^{-1}U)', Q=C^{-1} + U'A^{-1}U, and V=U'.