I have a sparse matrix Q that is a covariance matrix for a Gaussian Process. I need to invert Q for computations in Bayesian full-conditional distributions. For example, I need to compute terms like \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, where \mathbf{X}\boldsymbol{\beta} has the typical linear regression definition. I have found that inv(sparse(Q))
returns an error stating "The inverse of a sparse matrix can often be dense and can cause the computer to run out of memory..."
The fastest solution I have to compute the inverse of Q is using inv(cholesky(Q))
. However, this is still taking much longer than I’d like it to.
Throughout your Julia journeys, how have you handled inverting sparse matrices?
Rather than inverting, you almost always should use a linear solve. So, for example Q\X
. If you want to do a bunch of solves, you can use c=cholesky(Q)
and c\X
.
4 Likes
So, if I want to compute \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, would I be able to use beta'*X'*Q\(beta'*X')
? Or am I completely off?
As the warning indicates, sparse matrices seldom have sparse inverses. The best way to perform these sorts of calculations is with a factorization of your Q
matrix, rather than an explicit inverse. Factorizations of sparse matrices often can be sparse, which is one reason why this is preferable. Since your Q
is positive definite, you’ve already stumbled across the cholesky
function. You can use this factorization object directly in calculations. For example, your \beta X^T Q^{-1} calculation can be done as cholQ = cholesky(Q)
followed by β * X' / cholQ
.
2 Likes
I see. What if I wanted to do a calculation such as (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'*Q^{-1}*(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})? Would I be able to use (y-X*beta)'*(y-X*beta)/cholQ
, or am I violating some rules by bringing cholQ
to the end?
Matrix multiplication isn’t commutative so you have to apply it in the right order, but other than that, this all should work.
2 Likes
Matrix multiplication is not commutative, so A * B != B * A
except under specific circumstances that you should not usually count on. By extension, you cannot move the Q^{-1} to the end.
The correct way to write (y - X\beta)^\top Q^{-1} (y - X\beta) in code is ((y - X * beta)' / cholQ) * (y - X * beta)
or (y - X * beta)' * (cholQ \ (y - X * beta))
. Note that /
and \
are used to invert the matrix on the right or left of a multiplication, respectively. In this case I would write this as
cholQ = cholesky(Q)
r = y - X * beta
z = r' * (cholQ \ r) # or (r' / cholQ) * r
to avoided computing y - X * beta
twice, since the compiler will probably not be smart enough to see that redundancy on its own.
5 Likes
If you need to calculate a lot of times the value for different y-X \beta but fixed and already Cholesky factorize Q, then you could save some time by exploiting the symmetry of the factorization (one triangular backward substitution instead of two)
r = y - X * beta
w = cholQ.factors.PtL \ r
z=w'*w
and of course you can try to save some allocations using in place calculations.
1 Like
The .factors.PtL
property must be new in v1.9
, or is otherwise unavailable in my version of v1.8
at least (.factors
is simply returning the raw underlying storage and has no special properties). In the meantime, if one is using a non-pivoted Cholesky decomposition one can use w = cholQ.L \ r
. If pivoted, one can use w = cholQ.L \ r[cholQ.p]
.
1 Like
Sorry, I was copying a piece of code on a version on which we added a layer.
w = cholQ.PtL \ r
should work
1 Like
Is there any way that I can use Cholesky factorizations to reduce the amount of computation (RAM) needed to compute rQr'? Right now, my r
is 576x165600
, so, w
is 576x165600
, and my 64GB of RAM is still not enough to compute z=w'*w
.
Maybe @mikmoore would have an idea, too.
I hadn’t realized that r
(and y
and beta
) were matrices and not vectors.
The issue is that you’re trying to compute a 165600x165600
matrix, which has 27 billion elements. For a dense matrix at Float64
precision this is 220GB. If w
is sparse, you can call sparse(w)
to make it sparse again and then w'*w
will be sparse and might be much smaller.
But I’m not sure I would expect w
to actually be sparse. Further, you’re computing a big matrix which is 165600x165600
but is only rank 576
(or less), which is almost always the wrong thing to do even if it is sparse.
I would try to refactor your following calculations to exploit this. For example, instead of computing (w' * w) * v
, which will fail on w' * w
, compute w' * (w * v)
which will succeed unless v
has a large number of columns.
1 Like
Can you clarify ? Is it r' Q^{1} r or r Q^{-1} r' you calculate, or in other words what is the size of Q.
Also, what do you want to use z for ? If you multiply it with a vector for example, z* v you could do (w'*(w*v)) and save memory
2 Likes
Surprisingly, w
is sparse, but I agree that computing the big matrix seems wrong.
Are you suggesting that v
is a variable that I already mentioned, or was that just for an example?
Just an example. It is rare that one is interested in a huge matrix z on itself. Generally you want to do something with it and the suggestion is to look how to organize the following calculations, exemplified here with a multiplication by a vector.
3 Likes
I am trying to compute r'Q^{-1}r, so Q is 576x576
.
Ultimately, I am going to use z
for a symmetric Woodbury inverse. So for reference, r=(A^{-1}U)', Q=C^{-1} + U'A^{-1}U, and V=U'.