Singular Exception with LKJCholesky

Hi there,

I’ve seen the recent development for the LKJCholesky implementations within Distributions.jl and am attempting to put it to use within Turing.jl. I understand that Turing.jl doesn’t yet support Cholesky Variate distributions. However, @sethaxen gave a few tips within the Turing Slack channel on how to use it in Turing.jl anyway:

Okay I underestimated how long a proper write-up will take, so here’s the short version. It uses TransformVariables instead of Bijectors, since that simplifies the code and lets us use only API functions.Instead of doing

R ~ LKJ(n, eta)

import TransformVariables and do

trans = CorrCholeskyFactor(n)

R_tilde ~ filldist(Flat(), dimension(trans))

R_U, logdetJ = transform_and_logjac(trans, R_tilde)

Turing.@addlogprob! logpdf(LKJCholesky(n, eta), Cholesky(R_U)) + logdetJ

When using the correlation matrix, instead of doing

Σ = Diagonal(σ) * R * Diagonal(σ)

y ~ MvNormal(μ, Σ)

import PDMats and do

Σ_L = LowerTriangular(collect(σ .* R_U')) # collect is necessary for ReverseDiff for some reason

Σ = PDMat(Cholesky(Σ_L))

y ~ MvNormal(μ, Σ)

Given this info I put together an example modeled after this blog post on Using a LKJ prior in Stan. It has simulated data but I am seeing SingularException errors with this example. It doesn’t always error and sometimes samples really well.

Simulated data:

using Distributions, Turing, LinearAlgebra, Random

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

The model in Turing.jl:

using TransformVariables, PDMats

# model
@model function correlation_cholesky(J, N, y)
  sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
  # LKJ work around
  n, eta = J, 1.0

  trans = CorrCholeskyFactor(n)
  R_tilde ~ filldist(Flat(), dimension(trans))
  R_U, logdetJ = transform_and_logjac(trans, R_tilde)
  F = Cholesky(R_U)
  Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
  Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
  Sigma = PDMat(Cholesky(Σ_L))

  for i in 1:N
    y[i,:] ~ MvNormal(Sigma)
  end
  return (; Omega = Matrix(F))
end

Then sample the model:

mod = correlation_cholesky(J, N, y)

chain_chol = sample(mod, NUTS(0.65), 2000)

It’s here that the error commonly occurs. Any info would be appreciated.

1 Like

This seems to happen when a divergence is hit during warm-up. A divergence can cause the parameters to fly off to infinity in unconstrained space. For the Cholesky factors, flying to infinity could result in the diagonal of the Cholesky factor containing a zero. When this happens, the covariance matrix is no longer positive definite and is instead positive semi-definite, which causes PDMats.invquad to throw the error you are seeing as it tries to call ldiv! with the triangular Cholesky factor.

I suspect Stan samples this just fine for at least one of 2 reasons:

  1. While Turing chooses a default delta value of 0.65, Stan chooses 0.8. This decreases the amount of divergences in general so probably also during the warm-up.
  2. Stan may have checks so that instead of simply throwing an error in this case, an infinite log-density is returned, which would be logged as a divergence.

We can try both. When I use NUTS(0.8), after 20 runs, I never once saw the error.

Here’s how we could use the 2nd option:

# model
@model function correlation_cholesky(J, N, y)
  sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
  # LKJ work around
  n, eta = J, 1.0

  trans = CorrCholeskyFactor(n)
  R_tilde ~ filldist(Flat(), dimension(trans))
  R_U, logdetJ = transform_and_logjac(trans, R_tilde)
  F = Cholesky(R_U)
  Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
  Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
  Sigma = PDMat(Cholesky(Σ_L))

  if any(i -> iszero(Σ_L[i]), diagind(Σ_L))
    Turing.@addlogprob! Inf
  else
    for i in 1:N
      y[i,:] ~ MvNormal(Sigma)
    end
  end
  return (; Omega = Matrix(F))
end

When I sample with this model, I sometimes get warnings of numerical error, but these seem to only occur during warm-up, so no problems there.

This is great and a really informative solution as well! Thanks Seth. If you don’t mind, a couple of follow up questions:

  1. How did you notice this was happening during warmup and not during the real sampling of the target distribution?
  2. If Stan possibly catches this error and returns an infinite log-density instead do you think Turing should also capture this case whenever F ~ LKJCholesky(n, eta) is officially implemented?

Thanks for the help

Marking the log-density as Inf will be logged as a :numerical_error in the chains (any value that values the isfinite check for log density or its gradient will), so if we check a numerical error warning and yet see no numerical errors after sampling (with sum(chain_chol[:numerical_error])), then the divergences were during warm-up.

Hm, not certain. I mean, Turing could put a try...catch around all log-density evaluations and mark any errors as divergences. I’m guessing this would be closer to what Stan does. But I’m guessing this would be less efficient and also make debugging Turing models much harder. Another Turing dev might have a more useful answer though.

1 Like

Thanks again Seth, this is really helpful.

1 Like