I have a sparse matrix Q that is a covariance matrix for a Gaussian Process. I need to invert Q for computations in Bayesian full-conditional distributions. For example, I need to compute terms like \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, where \mathbf{X}\boldsymbol{\beta} has the typical linear regression definition. I have found that `inv(sparse(Q))`

returns an error stating `"The inverse of a sparse matrix can often be dense and can cause the computer to run out of memory..."`

The fastest solution I have to compute the inverse of Q is using `inv(cholesky(Q))`

. However, this is still taking much longer than I’d like it to.

Throughout your Julia journeys, how have you handled inverting sparse matrices?

Rather than inverting, you almost always should use a linear solve. So, for example `Q\X`

. If you want to do a bunch of solves, you can use `c=cholesky(Q)`

and `c\X`

.

4 Likes

So, if I want to compute \boldsymbol{\beta}^T\mathbf{X}^TQ^{-1}, would I be able to use `beta'*X'*Q\(beta'*X')`

? Or am I completely off?

As the warning indicates, sparse matrices seldom have sparse inverses. The best way to perform these sorts of calculations is with a factorization of your `Q`

matrix, rather than an explicit inverse. Factorizations of sparse matrices often can be sparse, which is one reason why this is preferable. Since your `Q`

is positive definite, you’ve already stumbled across the `cholesky`

function. You can use this factorization object directly in calculations. For example, your \beta X^T Q^{-1} calculation can be done as `cholQ = cholesky(Q)`

followed by `β * X' / cholQ`

.

2 Likes

I see. What if I wanted to do a calculation such as (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'*Q^{-1}*(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})? Would I be able to use `(y-X*beta)'*(y-X*beta)/cholQ`

, or am I violating some rules by bringing `cholQ`

to the end?

Matrix multiplication isn’t commutative so you have to apply it in the right order, but other than that, this all should work.

2 Likes

Matrix multiplication is not commutative, so `A * B != B * A`

except under specific circumstances that you should not usually count on. By extension, you cannot move the Q^{-1} to the end.

The correct way to write (y - X\beta)^\top Q^{-1} (y - X\beta) in code is `((y - X * beta)' / cholQ) * (y - X * beta)`

or `(y - X * beta)' * (cholQ \ (y - X * beta))`

. Note that `/`

and `\`

are used to invert the matrix on the right or left of a multiplication, respectively. In this case I would write this as

```
cholQ = cholesky(Q)
r = y - X * beta
z = r' * (cholQ \ r) # or (r' / cholQ) * r
```

to avoided computing `y - X * beta`

twice, since the compiler will probably not be smart enough to see that redundancy on its own.

5 Likes

If you need to calculate a lot of times the value for different y-X \beta but fixed and already Cholesky factorize Q, then you could save some time by exploiting the symmetry of the factorization (one triangular backward substitution instead of two)

```
r = y - X * beta
w = cholQ.factors.PtL \ r
z=w'*w
```

and of course you can try to save some allocations using in place calculations.

1 Like

The `.factors.PtL`

property must be new in `v1.9`

, or is otherwise unavailable in my version of `v1.8`

at least (`.factors`

is simply returning the raw underlying storage and has no special properties). In the meantime, if one is using a non-pivoted Cholesky decomposition one can use `w = cholQ.L \ r`

. If pivoted, one can use `w = cholQ.L \ r[cholQ.p]`

.

1 Like

Sorry, I was copying a piece of code on a version on which we added a layer.

```
w = cholQ.PtL \ r
```

should work

1 Like

Is there any way that I can use Cholesky factorizations to reduce the amount of computation (RAM) needed to compute rQr'? Right now, my `r`

is `576x165600`

, so, `w`

is `576x165600`

, and my 64GB of RAM is still not enough to compute `z=w'*w`

.

Maybe @mikmoore would have an idea, too.

I hadn’t realized that `r`

(and `y`

and `beta`

) were matrices and not vectors.

The issue is that you’re trying to compute a `165600x165600`

matrix, which has 27 billion elements. For a dense matrix at `Float64`

precision this is 220GB. If `w`

is sparse, you can call `sparse(w)`

to make it sparse again and then `w'*w`

will be sparse and might be much smaller.

But I’m not sure I would expect `w`

to actually be sparse. Further, you’re computing a big matrix which is `165600x165600`

but is only rank `576`

(or less), which is almost always the wrong thing to do even if it is sparse.

I would try to refactor your following calculations to exploit this. For example, instead of computing `(w' * w) * v`

, which will fail on `w' * w`

, compute `w' * (w * v)`

which will succeed unless `v`

has a large number of columns.

1 Like

Can you clarify ? Is it r' Q^{1} r or r Q^{-1} r' you calculate, or in other words what is the size of Q.

Also, what do you want to use z for ? If you multiply it with a vector for example, z* v you could do (w'*(w*v)) and save memory

2 Likes

Surprisingly, `w`

is sparse, but I agree that computing the big matrix seems wrong.

Are you suggesting that `v`

is a variable that I already mentioned, or was that just for an example?

Just an example. It is rare that one is interested in a huge matrix z on itself. Generally you want to do something with it and the suggestion is to look how to organize the following calculations, exemplified here with a multiplication by a vector.

3 Likes

I am trying to compute r'Q^{-1}r, so Q is `576x576`

.

Ultimately, I am going to use `z`

for a symmetric Woodbury inverse. So for reference, r=(A^{-1}U)', Q=C^{-1} + U'A^{-1}U, and V=U'.