# Singular Exception with LKJCholesky

Hi there,

I’ve seen the recent development for the `LKJCholesky` implementations within `Distributions.jl` and am attempting to put it to use within `Turing.jl`. I understand that `Turing.jl` doesn’t yet support Cholesky Variate distributions. However, @sethaxen gave a few tips within the `Turing` Slack channel on how to use it in `Turing.jl` anyway:

Okay I underestimated how long a proper write-up will take, so here’s the short version. It uses TransformVariables instead of Bijectors, since that simplifies the code and lets us use only API functions.Instead of doing

``````R ~ LKJ(n, eta)
``````

import TransformVariables and do

``````trans = CorrCholeskyFactor(n)

R_tilde ~ filldist(Flat(), dimension(trans))

R_U, logdetJ = transform_and_logjac(trans, R_tilde)

Turing.@addlogprob! logpdf(LKJCholesky(n, eta), Cholesky(R_U)) + logdetJ
``````

When using the correlation matrix, instead of doing

``````Σ = Diagonal(σ) * R * Diagonal(σ)

y ~ MvNormal(μ, Σ)
``````

import PDMats and do

``````Σ_L = LowerTriangular(collect(σ .* R_U')) # collect is necessary for ReverseDiff for some reason

Σ = PDMat(Cholesky(Σ_L))

y ~ MvNormal(μ, Σ)
``````

Given this info I put together an example modeled after this blog post on Using a LKJ prior in Stan. It has simulated data but I am seeing `SingularException` errors with this example. It doesn’t always error and sometimes samples really well.

Simulated data:

``````using Distributions, Turing, LinearAlgebra, Random

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
0.3 1 0.1;
0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'
``````

The model in `Turing.jl`:

``````using TransformVariables, PDMats

# model
@model function correlation_cholesky(J, N, y)
sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
# LKJ work around
n, eta = J, 1.0

trans = CorrCholeskyFactor(n)
R_tilde ~ filldist(Flat(), dimension(trans))
R_U, logdetJ = transform_and_logjac(trans, R_tilde)
F = Cholesky(R_U)
Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
Sigma = PDMat(Cholesky(Σ_L))

for i in 1:N
y[i,:] ~ MvNormal(Sigma)
end
return (; Omega = Matrix(F))
end
``````

Then sample the model:

``````mod = correlation_cholesky(J, N, y)

chain_chol = sample(mod, NUTS(0.65), 2000)
``````

It’s here that the error commonly occurs. Any info would be appreciated.

1 Like

This seems to happen when a divergence is hit during warm-up. A divergence can cause the parameters to fly off to infinity in unconstrained space. For the Cholesky factors, flying to infinity could result in the diagonal of the Cholesky factor containing a zero. When this happens, the covariance matrix is no longer positive definite and is instead positive semi-definite, which causes `PDMats.invquad` to throw the error you are seeing as it tries to call `ldiv!` with the triangular Cholesky factor.

I suspect Stan samples this just fine for at least one of 2 reasons:

1. While Turing chooses a default delta value of 0.65, Stan chooses 0.8. This decreases the amount of divergences in general so probably also during the warm-up.
2. Stan may have checks so that instead of simply throwing an error in this case, an infinite log-density is returned, which would be logged as a divergence.

We can try both. When I use `NUTS(0.8)`, after 20 runs, I never once saw the error.

Here’s how we could use the 2nd option:

``````# model
@model function correlation_cholesky(J, N, y)
sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
# LKJ work around
n, eta = J, 1.0

trans = CorrCholeskyFactor(n)
R_tilde ~ filldist(Flat(), dimension(trans))
R_U, logdetJ = transform_and_logjac(trans, R_tilde)
F = Cholesky(R_U)
Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
Sigma = PDMat(Cholesky(Σ_L))

if any(i -> iszero(Σ_L[i]), diagind(Σ_L))
else
for i in 1:N
y[i,:] ~ MvNormal(Sigma)
end
end
return (; Omega = Matrix(F))
end
``````

When I sample with this model, I sometimes get warnings of numerical error, but these seem to only occur during warm-up, so no problems there.

This is great and a really informative solution as well! Thanks Seth. If you don’t mind, a couple of follow up questions:

1. How did you notice this was happening during warmup and not during the real sampling of the target distribution?
2. If Stan possibly catches this error and returns an infinite log-density instead do you think Turing should also capture this case whenever `F ~ LKJCholesky(n, eta)` is officially implemented?

Thanks for the help

Marking the log-density as `Inf` will be logged as a `:numerical_error` in the chains (any value that values the `isfinite` check for log density or its gradient will), so if we check a numerical error warning and yet see no numerical errors after sampling (with `sum(chain_chol[:numerical_error])`), then the divergences were during warm-up.

Hm, not certain. I mean, Turing could put a `try...catch` around all log-density evaluations and mark any errors as divergences. I’m guessing this would be closer to what Stan does. But I’m guessing this would be less efficient and also make debugging Turing models much harder. Another Turing dev might have a more useful answer though.

1 Like

Thanks again Seth, this is really helpful.

1 Like