Square root of the inverse of a matrix

mleprovost · June 16, 2020, 5:56pm

Hello,

I am working on square root factorization of the matrix of the form A = I + U'U which is typically positive semi-definite of dimension N \sim 100.

I need to solve a least square problem A^{-1}Y and compute the product \sqrt{A^{-1}} R, where Y a non-square matrix and R an orthogonal matrix.

What is the best way to perform these operations? I think it might be advantageous to do a matrix factorization of A but unfortunately, sqrt doesn’t work with eigenfactorizations.

using Random
using RandomMatrices

N = 100
Nx = 50
U = randn(N, Nx)
A = I + U'*U

Y = randn(Nx,N)

# First operation
A\Y

# Generate Random Matrices
d = Haar(1)
R = rand(d, N)

# Second operation
real.(sqrt(inv(A))*R

jlchan · June 16, 2020, 6:30pm

If the matrix is symmetric positive definite, how about pre computing the eigen decomposition and using that to compute both the inverse and square root?

mleprovost · June 16, 2020, 6:41pm

Yes, that’s true. I wish I could just provide the Eigen object to the blackslash function\ and sqrt but it doesn’t work.
Instead, I think that I have to do it by hand, by looping over the eigenvalues, right?

Mason · June 16, 2020, 6:55pm

You should tell julia that A is hermitian to avoid generating a complex matrix and then dropping the imaginary values
The inverse and square root commute here, so you should be able to instead calculate

sqrt(Hermitian(A))\R

which should be equivalent and faster than real.(sqrt(inv(A))*R. However, someone who knows more will likely have a better idea that will avoid repeating the factorization multiple times.

antoine-levitt · June 16, 2020, 6:56pm

You can simply do this with E.vectors * Diagonal( 1 ./ sqrt.(E.values)) * E.vectors'. In my experience matrix square roots are rarely needed, and Cholesky factorizations are often just as good. Also, an SVD of U can be more stable.

Mason · June 16, 2020, 7:35pm

So to put this all together, here’s how I understand this should be approached.

You can use your favorite factorization as it suits you. I tested svd and eigen and found svd was a bit slower, but as Antoine said, it can be more stable:

using Random, RandomMatrices
function f1(A, Y, R)
    (op1=A\Y, 
     op2=real.(sqrt(inv(A)))*R)
end
function f2(A::Hermitian, Y, R)
    (op1 = A\Y, 
     op2 = sqrt(A)\R)
end
function f3(A::Hermitian, Y, R)
    λ, ϕ = eigen(A)
    
   (op1 = ϕ * Diagonal(1 ./ λ) * ϕ' * Y,
    op2 = ϕ * Diagonal(1 ./ sqrt.(λ)) * ϕ' * R)
end

let N = 100, Nx = 50
    U = randn(Nx, N) #switched Nx and N so the code runs
    A = I + U'*U
    
    Y = randn(N, Nx)  #switched Nx and N so the code runs

    # Generate Random Matrices
    d = Haar(1)
    R = rand(d, N)

    r1 = @btime f1($A, $Y, $R)
    r2 = @btime f2($(Hermitian(A)), $Y, $R)
    r3 = @btime f3($(Hermitian(A)), $Y, $R)
    Tuple(r1) .≈ Tuple(r2) .≈ Tuple(r3) #test that all the outputs are the same
end

#+RESULTS:
:   9.683 ms (31 allocations: 1.38 MiB)
:   1.781 ms (33 allocations: 705.03 KiB)
:   1.149 ms (31 allocations: 704.16 KiB)
: (true, true)

mleprovost · June 16, 2020, 7:38pm

Thank you @Mason, this is very nice of you!!

stevengj · June 16, 2020, 7:39pm

What is the context where you need this? (In many cases where you might ostensibly use a matrix square root, you can use Cholesky factors instead.)

mleprovost · June 16, 2020, 7:52pm

I am working on the filtering problem for nonlinear systems. I am using the ensemble transform Kalman filter which is an improved version of the ensemble Kalman filter to assimilate measurements in the state estimate. Bishop et al. https://journals.ametsoc.org/mwr/article/129/3/420/66417/Adaptive-Sampling-with-the-Ensemble-Transform uses a square-root factorization of the covariance propagation to produce a consistent analysis step.

The covariance matrix P^f, P^a before and after the assimilation of measurements are related by:

P^a = (I - KH) P^f, where K is the Kalman gain and H the observation matrix.

We can do the factorization P^a = X^a {X^a}^T where X^a = \frac{1}{\sqrt{N-1}}[x^1-\bar{x}, \ldots, x^N-\bar{x}], and the x^i are N samples from the state distribution and \bar{x} the sample mean.

When you do the algebra, you end up with an update equation for X^a that involves the square root of an expression with the Kalman gain.

Tamas_Papp · June 17, 2020, 9:14am

As suggested by @antoine-levitt, the SVD is very nice for Kalman filtering. Maybe you can adapt the standard tricks from the plain vanilla KF, eg see and the papers that cite it.

@inproceedings{wang1992kalman,
  title={Kalman filter algorithm based on singular value decomposition},
  author={Wang, Liang and Libert, Ga{\"e}tan and Manneback, Pierre},
  booktitle={Decision and Control, 1992., Proceedings of the 31st IEEE Conference on},
  pages={1224--1229},
  year={1992},
  organization={IEEE}
}

stevengj · June 17, 2020, 12:55pm

If you look at their algebra:

this seems to be precisely one of those cases where they could instead have used a Cholesky factorization R = LL^T. In particular, if you follow the same algebra and let \hat{H} = L^{-1} H (via cholesky(R).L \ H in Julia), then you obtain:

PH^T(HPH^T + R)^{-1} HP = PH^T\left[ L \left( L^{-1} H P H^T L^{-T} + I\right)L^T\right]^{-1} H P \\ = P \hat{H}^T \left( \hat{H} P \hat{H}^T + I\right)^{-1} \hat{H} P

with no matrix square root but a very similar form to their expression that seems to work equally well for the subsequent steps in the derivation.

(Note that this is closely related to Tikhonov/ridge-regularized least-squares, and there are a variety of techniques such, as the GSVD, for attacking such problems accurately.)

Mason · June 17, 2020, 4:28pm

Well, if it’s correct to use a Cholesky factorization here it’s by far the most performant option given in this thread yet:

using Random
using RandomMatrices
function f1(A, Y, R)
    (op1=A\Y, 
     op2=real.(sqrt(inv(A)))*R)
end
function f2(A::Hermitian, Y, R)
    (op1 = A\Y, 
     op2 = sqrt(A)\R)
end
function f3(A::Hermitian, Y, R)
    λ, ϕ = eigen(A)
    (op1 = ϕ * Diagonal(1 ./ λ) * ϕ' * Y,
     op2 = ϕ * Diagonal( 1 ./ sqrt.(λ)) * ϕ' * R)
end
function f4(A::Hermitian, Y, R)
    C = cholesky(A)
    (op1 = C \ Y,
     op2 = C.L \ R)
end

let N = 100, Nx = 50
    U = randn(Nx, N) #switched Nx and N so the code runs
    A = I + U'*U
    
    Y = randn(N, Nx)  #switched Nx and N so the code runs

    # Generate Random Matrices
    d = Haar(1)
    R = rand(d, N)

    r1 = @btime f1($A, $Y, $R)
    r2 = @btime f2($(Hermitian(A)), $Y, $R)
    r3 = @btime f3($(Hermitian(A)), $Y, $R)
    r4 = @btime f4($(Hermitian(A)), $Y, $R)

    (same_results = Tuple(r1) .≈ Tuple(r4), 
     same_eigvalues = eigvals(r4.op2) ≈ eigvals(r1.op2),
     same_singvalues = svd(r4.op2).S ≈ svd(r1.op2).S)  
end

:RESULTS:
:   9.786 ms (31 allocations: 1.38 MiB)
:   1.780 ms (33 allocations: 705.03 KiB)
:   1.173 ms (31 allocations: 704.16 KiB)
:   117.878 μs (10 allocations: 273.80 KiB)
: (same_results = (true, false), same_eigvalues = false, same_singvalues = true)
:END:

As you can see, you don’t get the same answer for the second operation when you use the Cholesky L factor, but that doesn’t mean it isn’t equivalent. The resulting matrix appears to have different eigenvalues, but the same singular values as the other methods.

stevengj · June 17, 2020, 4:44pm

Yes, it’s not the same matrix — the matrix \hat{H} in my post is different from the matrix \tilde{H} in the original paper, but if you modify the other algebra (the x update) to match, changing R^{1/2} to L or L^T consistently, then the final result (the x update) will be the same up to roundoff errors.

mleprovost · June 17, 2020, 4:56pm

Thank you so much for your help. In practice, we have a finite number of samples O(50-100) to estimate a covariance matrix of dimension 200-500 in my case. Therefore the term (\hat{H}P\hat{H}^T + I)^{-1} is usually ill-conditioned (positive semi-definite). Can I still use a Cholesky decomposition?

stevengj · June 17, 2020, 5:02pm

I don’t quite follow your question: the Cholesky factorization is applied to R, not this matrix. (And \hat{H}P\hat{H}^T + I should be strictly positive-definite anyway because of the I?)

mleprovost · June 17, 2020, 5:19pm

I need to check a few things before to answer. Let me get back to you tomorrow.

mleprovost · June 19, 2020, 3:47pm

Hello,

Thank you @stevengj for your answers.
I wrote my code on the pseudo-code of the ETKF provided by Asch et al. SIAM - Bookstore , chapter 6:

where E = [x^1, \ldots, x^m] with x^i the i-th sample.

In my case, R is a diagonal matrix (with entries of order 10^{-8}), so its inverse square root is straightforward to compute.
The slow operations are to compute w in l.8 (the least-square problem) and T^{1/2} in l.9.
In my case, the determinant of T is small ~10^{-12}, so I don’t know if it makes sense to do a Cholesky decomposition of T.

antoine-levitt · June 19, 2020, 6:13pm

Determinants don’t exist, they’re a conspiracy propagated by linear algebra teachers. If you have 12 eigenvalues of 0.1, the determinant is 10^-12 which sounds scary, but it’s still perfectly reasonable to invert the matrix. More interesting is the condition number. In any case, either the thing is well-conditioned, in which case you can do what you want with it, or it’s ill-conditioned, in which case whatever you do with it will blow up, so it doesn’t really matter what you do (simplifying a full numerical analysis course ever so slightly).

It is likely that you can bypass the square root. Often square roots are made to change variables to ones where some overlap/Gram/covariance/whatever matrix is identity; this can also be done with Cholesky as @stevengj showed. You have to understand the derivation of the algorithm and see where the square root is there.

Topic		Replies	Views
Matrix square root (Error) General Usage package	5	4794	March 31, 2019
Fast square root of 3 x 3 matrix General Usage linearalgebra , staticarrays	8	1152	October 28, 2022
Square root of a matrix that has no square root General Usage matrices	3	424	December 14, 2021
Inverse of a boolean (or integer) matrix vs factorization Numerics nemo , factorization , inverse	7	816	June 8, 2021
Computing sparse orthogonal projections Numerics linearalgebra	6	1933	June 10, 2021

Square root of the inverse of a matrix

Related topics