Multivariate Normal with Positive Semi-Definite Covariance Matrix

question

#1

Hi All,

I’m using julia to sample from a gaussian process with squared exponential kernel.

using Distributions, Distances

x_star = -5:0.1:5
K = exp(-0.5 * pairwise(SqEuclidean(), x_star'))
f_star = rand(MvNormal(K), 1)

fails with a Base.LinAlg.PosDefException. In this example, the points are close together and K becomes singular i.e. det(K) # 0.0.

The equivalent calculation in R:

x.star <- seq(-5, 5, by=0.1)
K = exp(-0.5 * rdist::pdist(x.star) ^ 2)
f = mvtnorm::rmvnorm(1, sigma=K)

works fine, because the mvtnorm package only requires Positive Semi-definite matrices for the covariance (due to the multiple options for matrix decomposition I guess? The “cholesky” method fails, but “svd” and “eigen” both work fine).

This leads me to my question: is it possible to sample from a multivariate normal distribution in Julia with a PSD matrix? Is there some argument/coercion that I’m missing?

By the way, you can “hack” your way around this error by adding a small constant to the diagonal elements of K, e.g.

x_star = -5:0.1:5
K_psd = exp(-0.5 * pairwise(SqEuclidean(), x_star'))
K_pd = copy(K_psd);
for i in 1:length(x_star)
    K_pd[i, i] = K_pd[i, i] + 0.001
end
det(K_pd) # 2.00e-256
det(K_psd) # 0.00
f_star = rand(MvNormal(K_pd), 1)

but it would be nice to not have to do this.


#2

I don’t think this is in the Distributions package yet.

I would use the LDLt factorization (see the docs for ldltfact()). If K = LDL.’, then I’d take an iid normal vector and left-multiply by LD^{1/2}.


#3

Looking at the source code, it seems that rmvnorm uses eigenvalues (not Cholesky) by default, and it accepts eigenvalues as “non-negative” if λ ≥ – max|λ| √ε.


#4

Thanks @stevengj. Do you think the JuliaStats organization would be open to a pull request to add eigenvalue decomposition as an optional method for the MvNormal?


#5

I think we’d be happy for such a contribution but it might require quite a bit of restructuring of the existing code. Some of it might also need a cleanup and I think the existing code might be more complicated than needed. I don’t know how familiar you are with degenerate normals but handling it will complicate some methods so that part might also require some work. Finally, it varies how much time the developers have available so a PR might take a while to get reviewed. It shouldn’t be like that but that is how it is.


#6

Thanks @andreasnoack. I’m still learning about Julia, but I’m familiar with degenerate multivariate normals. Just as a thought, I see in the documentation, that there are three different constructors for multivariate normal, depending on the covariance structure. What do you think of adding a fourth constructor (e.g. DegenNormal or something similar) to handle the cases where the covariance matrix is not of full rank? Or does that make things to confusing?


#7

Since Cholesky is so much faster than eigenvalue computation, I think it would make more sense to first try cholfact in a try block by default. Then, if it catches a positive-definite exception, it can do a (hermitian) diagonalization and check whether the negative eigenvalues are sufficiently small to ignore.


#8

@stevengj That seems reasonable to me. I think it would be nice to have an optional argument to force an eigen decomposition without trying the cholesky decomposition first. I don’t think this would be too challenging to add to the MvNormal constructor, aside from the current requirement for a PDMat. Thoughts?


#9

An eigendecomposition is 25–50x more expensive than Cholesky on my machine, so you will hardly notice that it is trying Cholesky first.


#10

@stevengj I knew that cholesky decomposition was faster than an eigen decomposition, but didn’t realize there could be that much of an advantage… Agreed then - trying cholesky first would be hardly noticeable, especially if you are trying multiple covariance matrices that may or may not be PD.

So do you think it makes sense to modify the current MvNormal class which requires a PDMat? The \Sigma argument could be of type PSDMat (or something like that) and the constructor will first try a cholesky decomposition (as you said), followed by the eigen decomposition? Or should this class be split up into two?


#11

I’ve faced the same problem. I hope your team fix this bug and make this package more suitable for enjoying bayesian or statistical machine learning :slight_smile: