What is the best way to factorize/decompose a covariance matrix?

I need to draw from a certain multivariate normal distribution. To do so, I draw from standard normal distribution, then transform them to draws from multivariate normal distribution. Because the covariance matrix is only positive semi-definite, I cannot use Cholesky factor. So, I use spectral decomposition instead. However, because of numeric approximation, some eigenvalues of the matrix are very small but negative. This is problematic because I need to take square root of those eigenvalues. I wonder if there exists a better and stable way to do it?

My current code is like, cov is my covariance matrix:

using LinearAlgebra
cov_spectral = eigen(cov)

Then, I got cov_spectral.vectors' and 'cov_spectral.values corresponding to eigenvectors and eigenvalues. Some elements in 'cov_spectral.values` are small but negative.

Have you told Julia the matrix is Symetric? That will make most decompositions about 2x faster, and possibly more stable due to using better algorithms

I just tried it. But, I still got negative eigenvalues.

What’s the condition number of your matrix? If it’s really big, you might not be able to do much.

If you know from theory that your matrix is psd, just adjust those eigenvalues to zero. Better yet, just pick the relevant subspace (there is no point in drawing random numbers you end up multiplying by 0 anyway).

4 Likes

Would picking the relevant subspace mean to do eigen decomposition and just picking the positive eigenvalues or cholesky with pivoting and picking nonzero diagonal elements?
My understanding is that one can use pivoting in such cases.

If I understand correctly here numerically negative eigenvalues can occur, so I would go with an eigendecomposition instead of Cholesky.

Your best bet would be to avoid the covariance matrix altogether. If you have a transform that maps into your other space just use it (or use the Q factor from a qr decomposition). You’ll get much higher accuracy this way by avoiding the squaring & cancellation that occurs when you compute the covariance.

If you must use the covariance matrix, consider bunchkaufman. You can handle the 2x2 blocks and negative entries manually (either truncate negative values or take their absolute value, for instance).

Finally, if the matrix really is positive semidefinite, consider using cholesky(C, Val(true); check=false), which will compute the decomposition until it gets to a negative diagonal entry and then terminate. With pivoting, that should happen when you’ve handled all the components that are clearly positive.

4 Likes

As a side note to the discussion, there’s the Diagonalizations.jl package which provides some useful functionalities for factorizing cov matrices, if anything it might be helpful as something to benchmark against.

3 Likes

Depending on what the bottleneck in your code is, it might make sense to just eat the cost of an SVD or something once and then obtain a truncated symmetric factor.

If an SVD is too expensive but you have other things going for you, like knowing the rank of the matrix a priori or having a fast matvec or something, you could use a randomized method from LowRankApprox.jl, which is an excellent package. To be clear, even if you don’t have those things you’d probably still benefit from LowRankApprox’s randomized methods. My main point is that if you knew the rank a priori, for example, you’d precisely know your gain in time complexity from using the randomized method.

As several other people point out though, it would be nice to have more details about your problem. How big is the matrix? How much do you know about it? Is it using a parametric covariance function, or is it an empirical covariance matrix (which then has a symmetric factor for free)? Is it mathematically indefinite, or are you just using a very smooth covariance function and experiencing rank issues?

Hi All, thank you so much for all the suggestions, I will try one by one. As many of you suggested, let me post the detail of my code below. So the big picture is an optimization problem called simulated method of moments in which I need to estimate parameters in the multivariate normal distribution. It also involves simulation of this distribution for numerical integration.

I have an N by N matrix of random variables, i.e., there are N^2 random variables in total. Here, I set N=30 for testing the code, but it would be N=500 for the real problem. These N^2 random variables follow a multivariate normal distribution, which is defined by

Corr(\nu_{ij},\nu_{i'j'})=0, \text{ if } i\neq i' \text{ and } j\neq j'
Corr(\nu_{ij},\nu_{i'j'})=\rho_\nu, \text{ if } i=i' \text{ or } j=j' \text{ but not both}
SD(\nu_{ij},\nu_{ij})=\sigma_\nu

where (i,j) is the index of the N by N matrix. So, they have the same standard deviation, and they have the same correlation if the two random variables are in the same column or same row of the matrix. Here are my codes (the issue is in the second function fun_spectral):

using Distributions
using Random
using LinearAlgebra

N = 30 #size of the market
T = 2 #number of markets
num_moments = 2 #number of moments
num_simulation = 2 #number of draws for each numerical integration
rho_nu_est = 0.5 #correlation
sd_nu_est = 1 #standard deviation

# Fix the draws for unobservables.
# I first draw from standard normal distribution, then transform them to multivariate normal.
function fun_fixed_draws(num_moments::Int, num_simulation::Int, T::Int, N::Int)
    Random.seed!(1)
    #make it an array of arrays, each inner array contain #num_simulation simulations
    #the dimension is T * num_moments, so each inner array is for one (T, num_moments) combination
    fixed_draws_temp = Array[]
    for i = 1:num_moments*T
        push!(fixed_draws_temp, randn(N^2, num_simulation))
    end
    fixed_draws_temp = reshape(fixed_draws_temp, T, num_moments)
    return fixed_draws_temp
end
fixed_draws = fun_fixed_draws(num_moments, num_simulation, T, N)

# Transform the draw by spectral method.
# Here, I need to compute the covariance matrix and then factorize it.
function fun_spectral(rho_nu_est, sd_nu_est)
    #calculate covariance matrix of $\nu$ given parameters in the distribution
    cov_nu_temp = zeros(N^2, N^2)
    for i = 1:N^2
        cov_nu_temp[i,i] = sd_nu_est^2
    end
    for i = 1:N
        for j = 1:N
            for t = 1:N
                for k = 1:N
                    if i == t && j != k
                        cov_nu_temp[(i-1)*N+j, (t-1)*N+k] = rho_nu_est * sd_nu_est^2
                    elseif i != t && j == k
                        cov_nu_temp[(t-1)*N+k, (i-1)*N+j] = rho_nu_est * sd_nu_est^2
                    end
                end
            end
        end
    end
    #calculate the spectral decomposition
    #I set negative eigenvalues to zero
    cov_nu_spectral = eigen(cov_nu_temp)
    for i = 1:N^2
        if cov_nu_spectral.values[i]<=0
            cov_nu_spectral.values[i]=0
        end
    end
    #transform the original draw to draws from multivariate normal distribution
    fixed_draws_spec_temp = Array[]
    #in the for loop, i first, then t, otherwise the reshape will not put it in correct order
    for i = 1:num_moments
        for t = 1:T
            push!(fixed_draws_spec_temp, cov_nu_spectral.vectors*diagm(sqrt.(cov_nu_spectral.values))*fixed_draws[t,i])
        end
    end
    #reshape draw so that each colum contains the draws for one moments of all markets
    fixed_draws_spec_temp = reshape(fixed_draws_spec_temp, T, num_moments)
    return fixed_draws_spec_temp
end
fixed_draws_spectral = fun_spectral(rho_nu_est, sd_nu_est)

I think you might just have a book-keeping error in your code or aren’t enforcing the proper parametric constraints on the matrix. It’s easy to come up with choices of those parameters that make the matrix not positive definite, so you’d need to take that into account when you do the optimization.

As a somewhat unrelated comment, I personally like organizing code for function-generated matrices so that I can always fill in entries with something like Mat[j,k] = fn(something[j], something[k], parameters), which in my opinion makes book-keeping much easier. Unless I’ve misunderstood your problem, here’s an alternative code that samples from your multivariate normal distribution and works fine with Cholesky:

function covfn(p1, p2, params) 
  agreements = sum(p1 .== p2)
  ifelse(agreements==2, params[1], ifelse(agreements==1, params[2], 0.0))
end

function sample(nsample, points, params)
  M = Symmetric([covfn(pj, pk, params) for pj in points, pk in points])
  L = cholesky!(M).L
  return [L*randn(length(points)) for _ in 1:nsample]
end

# Sample:
const N = 30
const pts = vec(collect(Iterators.product(1:N, 1:N)))
example = sample(10, pts, [1.0, 0.4])

As an aside, if you really needed this to scale, your covariance matrix here has a huge amount of structure that isn’t exploited in this code. But unless you really need it to scale, this is probably good enough, and compute hours are much less valuable than human hours.

1 Like

I believe you are right. There must be some bugs in my code. I rewrote my code as you suggested and it turns out to be fine. Thank you very much.

I tried the pivoted cholesky cholesky(C, Val(true)), got an error RankDeficientException[1]. I then tried cholesky(C, Val(true); check=false) as you suggested. I multiply the lower triangular matrix from this by the random draws from standard normal distribution so that these draws should be equivalent to draws from a multivariate normal distribution with the covariance matrix C. However, I find that the correlation is not correct.

My task is to first draw from standard normal distribution, then convert these draws to be equivalent to draws from multivariate normal distribution with a known covariance matrix. So I need to find a way to factorize this covariance matrix which could be positive SEMI-definite. As @cgeoga suggested, my code is like this.

#dimension of the covariance matrix is N^2 by N^2
N = 3

#assign the covariance according to matrix index
function fun_nu_cov(p1, p2, rho_nu, sd_nu)
    agreements = sum(p1 .== p2)
    ifelse(agreements==2, sd_nu^2, ifelse(agreements==1, rho_nu * sd_nu^2, 0.0))
end

#initialize the index
const index = vec(collect(Iterators.product(1:N, 1:N)))

#this function computes the lower triangular matrix from Cholesky
function fun_lower_cholesky(rho_nu_est::Float64, sd_nu_est::Float64)
    nu_cov_temp = Symmetric([fun_nu_cov(p1, p2, rho_nu_est, sd_nu_est) for p1 in index, p2 in index])
    return cholesky(nu_cov_temp).L
end

#transform the random draw from standard normal to multivariate normal
S = 1000 #number of simulations
function fun_cholesky(rho_nu_est::Float64, sd_nu_est::Float64)
    fixed_draws_temp = randn(N^2, S)
    lower_cholesky_temp = fun_lower_cholesky(rho_nu_est, sd_nu_est)
    draws_temp = lower_cholesky_temp * fixed_draws_temp
    return draws_temp
end

#try correlation 0.5 and standard deviation 1
temp = fun_cholesky(0.5, 1)

#test the correlation and variance
cor(temp[1,:],temp[2,:]) #this should be about 0.5
var(temp[1,:]) #this should be about 1

When rho_nu_est=0.5 and sd_nu_est=1, the covariance matrix is positive semi-definite, so my code does not work as I use Cholesky.

If I change it to cholesky(nu_cov_temp, Val(true)).L, I got an error RankDeficientException.

If I change it to cholesky(nu_cov_temp, Val(true); check=false).L as suggested by @tim.holy , it does not pass the correlation test.

Currently, my solution is to rewrite function fun_lower_cholesky as follows.

function fun_lower_cholesky(rho_nu_est::Float64, sd_nu_est::Float64)
    nu_cov_temp = Symmetric([fun_nu_cov(p1, p2, rho_nu_est, sd_nu_est) for p1 in index, p2 in index])
    eigen_temp = eigen(nu_cov_temp)
    return eigen_temp.vectors * diagm(sqrt.(abs.(eigen_temp.values)))
end

I take absolute value of eigen_temp.values to fix the negatively close to zero numerical error.

With Val(true) you have to account for the permutation (which you’re ignoring). With pivoting you can recover rank 6, whereas without it’s only rank 4.

You also need to make sure you zero-out the remainder of the result. For example, not using pivoting to keep it simple,

julia> C = cholesky(nu_cov_temp, Val(false); check=false)
Failed factorization of type Cholesky{Float64,Array{Float64,2}}

julia> C.L
9×9 LowerTriangular{Float64,Array{Float64,2}}:
 1.0    ⋅          ⋅         ⋅          ⋅            ⋅    ⋅    ⋅    ⋅ 
 0.5   0.866025    ⋅         ⋅          ⋅            ⋅    ⋅    ⋅    ⋅ 
 0.5   0.288675   0.816497   ⋅          ⋅            ⋅    ⋅    ⋅    ⋅ 
 0.5  -0.288675  -0.204124  0.790569    ⋅            ⋅    ⋅    ⋅    ⋅ 
 0.0   0.57735   -0.204124  0.790569  -2.22045e-16   ⋅    ⋅    ⋅    ⋅ 
 0.0   0.0        0.612372  0.790569   0.5          1.0   ⋅    ⋅    ⋅ 
 0.5  -0.288675  -0.204124  0.158114   0.0          0.0  1.0   ⋅    ⋅ 
 0.0   0.57735   -0.204124  0.158114   0.5          0.0  0.5  1.0   ⋅ 
 0.0   0.0        0.612372  0.158114   0.0          0.5  0.5  0.5  1.0

everything in the lower triangle at or below the -2.22045e-16 is corrupted. You need to set all those values to 0. But you’ll get the wrong result, because for a PSD input matrix, only if you use pivoting can you guarantee that you’ve extracted all the necessary components. For example:

julia> v1 = [0, 1, 2]; v2 = [0, 2, 1];

julia> A = v1*v1' + v2*v2'
3×3 Array{Int64,2}:
 0  0  0
 0  5  4
 0  4  5

julia> C = cholesky(A, Val(false); check=false)
Failed factorization of type Cholesky{Float64,Array{Float64,2}}

julia> C.L
3×3 LowerTriangular{Float64,Array{Float64,2}}:
 0.0   ⋅    ⋅ 
 0.0  5.0   ⋅ 
 0.0  4.0  5.0

julia> C = cholesky(A, Val(true); check=false)
CholeskyPivoted{Float64,Array{Float64,2}}
U factor with rank 2:
3×3 UpperTriangular{Float64,Array{Float64,2}}:
 2.23607  1.78885  0.0
  ⋅       1.34164  0.0
  ⋅        ⋅       0.0
permutation:
3-element Array{Int64,1}:
 2
 3
 1

Without pivoting is a disaster here, but you get what you want with the pivoted version.

If any of this is confusing, I highly recommend reading up on how the Cholesky factorization is actually computed and why pivoting is important for problems like these.

2 Likes

To suggest something less technical, you don’t actually need a Cholesky factor to do what you’re doing. You just need some matrix V so that \Sigma = V V^T, because for a multivariate normal the variance is always given by \text{Var}(V \textbf{z}) = V I V^T = V \text{Var}(\textbf{z}) V^T, which will be what you want if \textbf{z} is just a vector of iid standard normals.

So if the rank stuff is a concern, you could always get a less efficient symmetric factor from the SVD for example. This function will give you a symmetric factor for rank deficient matrices and also truncate in that case, although that isn’t strictly necessary.

# Warning: consumes A. Gives M so that M*M' = A.
function symfact(A, tol=1.0e-14)
  @assert issymmetric(A) "A should be symmetric."
  Asvd = svd!(A)
  ix   = findall(x->x>tol, Asvd.S)
  return Asvd.U[:,ix] * Diagonal(sqrt.(Asvd.S[ix]))
end

This is obviously not efficient or optimal or anything cool like that, but it will work and then you can get back to doing the other stuff you wanted to be doing. And considering that at least your sample code was for 900 \times 900 matrices, I don’t think the slight inefficiency of this is really going to hurt. People suggest using the Cholesky for simulations because it’s faster than plenty of other factorizations because of the amount of structure that it can exploit. But there’s nothing statistical about that particular factorization, so it if isn’t convenient then I don’t think there’s a particular reason to fight with LAPACK to use it.

3 Likes

@cgeoga, thank you very much. There might be a glitch in your code, the output of this function is not a square matrix. But I guess I know what you mean here. One more question, how do you compare the performance of SVD with the performance of Spectral decomposition? The dimension of underlying covariance matrix could be very large, say 250000 by 250000. So I would like to choose the fastest one.

Currently, my code is like this.

function fun_spectral(rho_nu_est::Float64, sd_nu_est::Float64)
    nu_cov_temp = Symmetric([fun_nu_cov(p1, p2, rho_nu_est, sd_nu_est) for p1 in index, p2 in index])
    if isposdef(nu_cov_temp) == 1
        return cholesky(nu_cov_temp).L
    else
        # eigen_temp = eigen(nu_cov_temp)
        # return eigen_temp.vectors * diagm(sqrt.(abs.(eigen_temp.values)))
        Asvd = svd(nu_cov_temp)
        return Asvd.U * diagm(sqrt.(abs.(Asvd.S)))
    end
end

Thanks again.

@zxjroger—no glitch there. That’s a feature of that function. If your matrix is only positive semi-definite because it is rank deficient, then it admits a representation like \Sigma = V V^T where V is not square, and then you can work with the smaller matrix.

And regard to the size of your matrices: yikes, I thought you were working with small enough matrices that fine-grained performance didn’t matter. The eigendecomposition is definitely going to be faster, but a 250k square matrix will ask for about 500 GB of RAM so if you really wanted something of that size you’d have to do something reasonably clever to store it, let alone worth with it, and so you wouldn’t be able to use any of these off-the-shelf solutions we’ve been discussing.

Your matrix would have large blocks that are rank one and the vast majority of it would be zero, so a thoughtful custom-rolled solution would still make it very easy to work with. But if you wanted to use an off-the-shelf method, your only option is probably to use sparse methods.

I suppose in a word, without spending some time writing nontrivial code to do your more exploitable linear algebra, the eigendecomposition approach is the fastest choice.

1 Like