Getting strange MV Normal samples based on sparse precision matrix

I’ve noticed something odd while trying to generate Gaussian Markov random fields on an irregular mesh. In this model, a spatial field is a multivariate normal variable defined on the mesh nodes. The distribution is defined in terms of its precision matrix Q rather than its covariance \Sigma = Q^{-1}, since the Markov property means that Q is very sparse.

In theory, random samples can be generated efficiently from a precision matrix like so:

z = randn(size(Q,1))
x = cholesky(Q).U \ z

However, I’ve noticed that doing this with a sparse matrix Q produces bad samples–too spiky, without the correct correlation structure. In contrast, samples generated from the covariance matrix or the precision matrix converted to a dense matrix both look right (this precision matrix is defined such that the decorrelation scale should be about 20):

using DataFrames, CSV
using Random, SparseArrays, LinearAlgebra, Plots

Q_df = CSV.read("precision_matrix.csv")
point_df = CSV.read("mesh_points.csv")
Q = sparse(df.i, df.j, df.q)
Σ = Symmetric(inv(Matrix(Q)))

Random.seed!(2)
z = randn(size(Q, 1))

x = cholesky(Σ).L * z
p1 =surface(point_df.x, point_df.y, x, camera=(45, 80), title="Dense Σ")

xq = cholesky(Matrix(Q)).U \ z
p2 = surface(point_df.x, point_df.y, xq, camera=(45, 80), title="Dense Q")

xqs = cholesky(Q).U \ z
p3 = surface(point_df.x, point_df.y, xqs, camera=(45, 80), title="Sparse Q")

plot(p1, p2, p3, layout=(1,3), size=(1200, 400), colorbar=false)

The .csv files are available here:

This has to be some numeric issue with the sparse Cholesky and/or backsolve, right? Any ideas for what’s happening, or for fixes or workarounds?

1 Like

@ElOceanografo do you have a reference on the precision matrix method? In GeoStats.jl’s LUGaussSim solver we implement the LU with the covariance matrix. The good thing about the covariance method is that it can be adapted for conditional simulation when data is available at specific points of the mesh.

It’s constructed using the stochastic partial differential equation method in Lindgren et al. 2011, which I’ve been working on implementing in Julia.

I’ve actually used GeoStats to do conditional simulations for my work (thanks for the great package!). At the moment though I’m looking for ways to plug GPs into more elaborate probabilistic models (e.g. in Turing) and to scale them up to larger datasets. The SPDE/GMRF method is appealing for its scaling properties, both storage- and computation-wise.

2 Likes

That is really nice! If you are working on a Gaussian solver for spatial variables, it would be great to integrate with the project. All the Gaussian solvers we have implemented currently live in GaussianSimulation.jl in case you feel like contributing your work.

1 Like

The sparse factorization uses pivoting so you’d have to apply the permutation vector as well. Try with

cholesky(Q).PtL' \ z
1 Like

That was it, thanks! I didn’t know that about the sparse version (though I guess it is in the docs, now that I look).