Construction Overhead for MvNormal

MvNormal distributions in the Distributions package have a large construction overhead because the underlaying PDMats package performs a full Cholesky decomposition of the covariance matrix. In theory, I think this can be efficient when performing repeated operations on the distribution. An alternative package is GaussianDistributions, which does not have a noticeable construction overhead.

Below I’ve compared their performance for a joint construction and matrix multiplication operation on a “moderately sized” dense array. The multiplication should be faster due to the use of the Cholesky decomposition instead of the full aray, but is not.

Due to the Cholesky decomposition, MvNormal is still significantly slower and becomes more so for smaller arrays. It also uses more memory. Even for arrays as large as 1024 x 1024, Gaussian is still faster.

Distributions has a lot of features that I would like to take advantage of, but it looks like the use of PDMats might be holding back performance.

Any advice on how to avoid this overhead with MvNormal distributions? Does anyone see a large speedup for their application?

using Distributions, GaussianDistributions, BenchmarkTools

# create valid multi-Gaussian
N = 256
mean = rand(N)
tmp = rand(N, N)
cov = tmp * tmp'

# create arbitrary transformation
A = rand(N, N)
G1 = MvNormal(mean, cov) 
G2 = Gaussian(mean, cov)

# compare two tools
# = 1.5ms for me (0.5ms for construction, 1.0ms for multiplication)
@benchmark G1 = MvNormal(mean, cov) 
@benchmark A * G1

# = 0.5ms for me (0.0ms for construction, 0.5ms for multiplication)
@benchmark G2 = Gaussian(mean, cov) 
@benchmark A * G2

I am not sure why construction is a relevant benchmark. Presumably you want to do something with the distribution (evaluate pdfs, generate random draws), for pretty much everything you will need a factorized covariance matrix.

Using \Sigma is fine for theory, but for actual calculations you should go out of your way to avoid constructing it. I think that the constructor should go further and actually require a factor A (AA' = \Sigma) for multivariate normals (with Cholesky being a special case, but there is no harm in being generic). This would be consistent with requiring the standard deviation for Normal. The form with \Sigma should be deprecated and banished to dark oblivion.

All the points you make are good ones and your suggestion of removing Σ makes sense to me. Construction is sometimes a valid metric if your applications involves the creation and simple operation on many distributions. But in general, point taken. I’ve changed my example above to explicitly show that the math is slower, not just construction, with MvNormal.

Linear operations on normal distributions are extremely common in many fields. Does anyone have any insight why the factored tri-product used by PDMats below is slower than when taken on Σ directly?


# from Distributions.jl
*(B::AbstractMatrix, d::MvNormal) = MvNormal(B * d.μ, X_A_Xt(d.Σ, B))

# from PDMats.jl
function X_A_Xt(a::PDMat, x::StridedMatrix)
    cf = a.chol.U
    z = rmul!(copy(x), transpose(cf))
    return z * transpose(z)
end

With GitHub - mschauer/GaussianDistributions.jl: Gaussian distributions as state variables and for uncertainty quantification with Unitful what we use for Gaussian uncertainty propagation (it also defines some linear operators for Gaussians) you are free to construct a Gaussian with Cholesky factor or covariance matrix, maybe that helps.

Regarding cost, the PDMat code has to factorize again, because z is typically not triangular.

I guess this shows why construction time could be an important metric - every time an operation such as addition, multiplication, etc is performed on an MvNormal distribution, a new distribution is created with the overhead of factorization.

My application has state space aspects that require efficient affine transformations, like GaussianDistributions has, but also need features from Distributions such as sampling, mixture models, etc. Is there a clean way to use the performance and flexibility of GaussianDistributions within the larger framework of Distributions to get the desired tools?