The fastest way to calculate matrix inversion

yingqiuz · June 14, 2021, 2:27pm

Hi all

I need to calculate the inverse of a positive definite matrix H of the form H = (X’ * X + Diagonal(d)). X is a flat matrix, having (dominantly) more columns than rows. d consists of positive, large values.

The most far I could get is to turn the problem into finding the inverse of (I + X * Diagonal(1 ./ d) * X) using the Woodbury formula. I wonder if there is a particular method to speed up the matrix inversion, considering that it has a specific structure? cholesky!(I + X * Diagonal(1 ./ d) * X) or inv(I + X * Diagonal(1 ./ d) * X), or anything that makes more sense?

Thank you very much. Any suggestions would be much appreciated!

stevengj · June 14, 2021, 2:42pm

Are you sure you need the inverse? Why can’t you use the Cholesky factorization as-is?

yingqiuz · June 14, 2021, 2:46pm

Hi thank you for your reply. Are you suggesting that I can use C = cholesky!(I + X * Diagonal(1 ./ d) * X) such that its inverse is C.U \ (C.L \ I)?

I do need the inverse (more preferably), or at least the diagonals of the inverse…

stevengj · June 14, 2021, 3:01pm

No, I was suggesting that you consider whether you can use the Cholesky factors without explicitly computing the inverse. That is, given C = cholesky!(X'X + Diagonal(d)), you can solve a linear system for any given right-hand-side quickly, so in many cases you don’t need the inverse matrix explicitly.

If you really need the whole inverse matrix, I would suggest

LinearAlgebra.inv!(cholesky!(X'X + Diagonal(d)))

(I don’t see the point of your Diagonal(1 ./ d) reformulation.)

I do need the inverse (more preferably), or at least the diagonals of the inverse…

What are you ultimately trying to compute, and what are your typical matrix sizes?

yingqiuz · June 14, 2021, 9:42pm

Thank you so much for your reply. I suppose I do need the inverse - it is an intermediate step in my iterations, and I need the diagonal of the inverse and it’s determinant to be fed into the next iteration.

Calculating the determinate does not require inversion, though, but I was not able to find how to extract the diagonal of the matrix inverse without cubic complexity…

Given the cholesky decomposition, is there a faster way to only calculate the diagonal of the inverse? Unfortunately, the matrix of interest is huge, at least 30000 x 30000… but it grows smaller along the iterations.

Thank you very much!

stevengj · June 14, 2021, 10:05pm

What is your iteration trying to do? There might be some other algorithm for your ultimate goal…

yingqiuz · June 14, 2021, 10:30pm

Thank you so much - my iteration is trying to find the optimal value of a parameter that minimises a function, e.g., argmin_{w} f(X, w, a). I need the negative inverse of the Hessian to approximate w’s (posterior) distribution, at least in the final iteration. In the intermediate iterations, I (only) need the diagonal of the negative inverse of Hessian, which is necessary to optimise the other parameter, a.

The Hessian of w has a particular structure, (X’X + Diagonal(d)). Unfortunately the dimensions of X and w are huge - X 30,000 by 2,000,000 and w 2,000,000 by 1 - which renders it impossible to calculate the diagonal of its negative inverse Hessian. Thus I exploited the Woodbury identity (A + BD^-1C)^-1 = A^-1 - A^-1B(D + CA^-1B)^-1CA^-1 ,assuming A is the diagonal matrix Diagonal(d) here, B being X’, C being X, and D being the identity matrix, such that the inversion can be conducted on a matrix of much smaller scale, (I + X * Diagonal(1 ./ d) * X’). I only need the diagonal elements of its inverse.

In the final iteration, I need the entire negative inverse Hessian, but it would be of much smaller size.

Thank you so much for taking the time to read this - is there a better approach to do it? I considered BFGS, but storing the actual (approximated) inverse Hessian at each step is very expensive as well.

stevengj · June 14, 2021, 10:40pm

Why? You still haven’t said what function you are minimizing.

Have you considered L-BFGS?

yingqiuz · June 14, 2021, 11:19pm

Thanks - there are two steps in my iteration. Suppose X is of dimension n by d, w and a are of d by 1.

solve argmin_{w}f(X, t, w, a) = y’y + 0.5 * w’ * Diagonal(a) * w, keeping a fixed.
Here y = t .* log.( sigmoid.(X * w)) .+ (1 .- t) .* log(1 .- sigmoid.(X * w)). t is of n by 1.
after finding the optimal value of w, keeping it fixed, and solve argmin_{a}f2(X, t, w, a) = y’y + 0.5 * w’ * Diagonal(a) * w + 0.5sum(log.(a)) + 0.5logdet(Sigma). Here Sigma is the negative inverse Hessian of w at its optimal value, and the Hessian of w is of the form (X’ * Diagonal(y .* (1 .- y)) * X + Diagonal(a))

It is a relevance vector machine problem.

Actually I have been baffled by this problem for quite a while. f(X, t, w, a) is the posterior of w, and f2(X, t, w, a) is the function with w integrated out (i.e., type 2 likelihood). As the exact integration is intractable, I used Laplace approximate as people usually do, calculating f2(X, t, w, a) as an approximation of the type 2 likelihood. My friends are suggesting doing SGD on f(X, t, w, a) to find w and a jointly in order to avoid f2, but I haven’t investigated if this would yield good results. …

How close is the approximate inverse Hessian to the actual inverse Hessian, yielded by program like BFGS/L-BFGS?

Again thank you so much for your time.

stevengj · June 15, 2021, 12:14am

Okay, so this is why you think you need the full matrix inverse — you are computing a log determinant? There are lots of methods for estimating the log determinant, e.g. by stochastic Lanczos quadrature + Hutchinson trace estimators. Moreover, since Hutchinson trace estimators are precisely in the form of an expected-value calculation, they are compatible with stochastic gradient descent (so you don’t need to compute the exact log determinant at each step in order to minimize it). These are iterative methods, so you only need matrix–vector products, which is useful if your matrix is large and sparse (or structured).

Alternatively, if you have the explicit Cholesky factorization C of a matrix A, you can compute logdet(A) or logdet(A⁻¹)=-logdet(A) directly from the Cholesky factors (just call logdet(C))

I would definitely recommend a single minimization rather than two nested minimizations if possible.

Topic		Replies	Views
Inverse with destination? General Usage	35	4247	May 19, 2021
Most efficient way of computing B'* inv(A)* B for A large Specific Domains linearalgebra	30	817	September 30, 2020
Explicit Inversion of a random matrix performs the same as Symmetric & SPD Performance question , linearalgebra	6	1053	October 25, 2018
Can I avoid inverting a matrix? General Usage question , linearalgebra	4	396	October 16, 2023
Is there a more efficient way to compute the log-density of a Multivariate normal with a Symmetric Woodbury Covariance? General Usage	3	162	May 4, 2023

The fastest way to calculate matrix inversion

Related topics