Singular Exception with LKJCholesky

joshualeond · July 30, 2022, 3:56am

Hi there,

I’ve seen the recent development for the LKJCholesky implementations within Distributions.jl and am attempting to put it to use within Turing.jl. I understand that Turing.jl doesn’t yet support Cholesky Variate distributions. However, @sethaxen gave a few tips within the Turing Slack channel on how to use it in Turing.jl anyway:

Okay I underestimated how long a proper write-up will take, so here’s the short version. It uses TransformVariables instead of Bijectors, since that simplifies the code and lets us use only API functions.Instead of doing
R ~ LKJ(n, eta)
import TransformVariables and do
trans = CorrCholeskyFactor(n)

R_tilde ~ filldist(Flat(), dimension(trans))

R_U, logdetJ = transform_and_logjac(trans, R_tilde)

Turing.@addlogprob! logpdf(LKJCholesky(n, eta), Cholesky(R_U)) + logdetJ
When using the correlation matrix, instead of doing
Σ = Diagonal(σ) * R * Diagonal(σ)

y ~ MvNormal(μ, Σ)
import PDMats and do
Σ_L = LowerTriangular(collect(σ .* R_U')) # collect is necessary for ReverseDiff for some reason

Σ = PDMat(Cholesky(Σ_L))

y ~ MvNormal(μ, Σ)

Given this info I put together an example modeled after this blog post on Using a LKJ prior in Stan. It has simulated data but I am seeing SingularException errors with this example. It doesn’t always error and sometimes samples really well.

Simulated data:

using Distributions, Turing, LinearAlgebra, Random

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

The model in Turing.jl:

using TransformVariables, PDMats

# model
@model function correlation_cholesky(J, N, y)
  sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
  # LKJ work around
  n, eta = J, 1.0

  trans = CorrCholeskyFactor(n)
  R_tilde ~ filldist(Flat(), dimension(trans))
  R_U, logdetJ = transform_and_logjac(trans, R_tilde)
  F = Cholesky(R_U)
  Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
  Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
  Sigma = PDMat(Cholesky(Σ_L))

  for i in 1:N
    y[i,:] ~ MvNormal(Sigma)
  end
  return (; Omega = Matrix(F))
end

Then sample the model:

mod = correlation_cholesky(J, N, y)

chain_chol = sample(mod, NUTS(0.65), 2000)

It’s here that the error commonly occurs. Any info would be appreciated.

sethaxen · August 3, 2022, 11:24am

This seems to happen when a divergence is hit during warm-up. A divergence can cause the parameters to fly off to infinity in unconstrained space. For the Cholesky factors, flying to infinity could result in the diagonal of the Cholesky factor containing a zero. When this happens, the covariance matrix is no longer positive definite and is instead positive semi-definite, which causes PDMats.invquad to throw the error you are seeing as it tries to call ldiv! with the triangular Cholesky factor.

I suspect Stan samples this just fine for at least one of 2 reasons:

While Turing chooses a default delta value of 0.65, Stan chooses 0.8. This decreases the amount of divergences in general so probably also during the warm-up.
Stan may have checks so that instead of simply throwing an error in this case, an infinite log-density is returned, which would be logged as a divergence.

We can try both. When I use NUTS(0.8), after 20 runs, I never once saw the error.

Here’s how we could use the 2nd option:

# model
@model function correlation_cholesky(J, N, y)
  sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
  # LKJ work around
  n, eta = J, 1.0

  trans = CorrCholeskyFactor(n)
  R_tilde ~ filldist(Flat(), dimension(trans))
  R_U, logdetJ = transform_and_logjac(trans, R_tilde)
  F = Cholesky(R_U)
  Turing.@addlogprob! logpdf(LKJCholesky(n, eta), F) + logdetJ
  Σ_L = LowerTriangular(collect((sigma .+ 1e-6) .* R_U'))
  Sigma = PDMat(Cholesky(Σ_L))

  if any(i -> iszero(Σ_L[i]), diagind(Σ_L))
    Turing.@addlogprob! Inf
  else
    for i in 1:N
      y[i,:] ~ MvNormal(Sigma)
    end
  end
  return (; Omega = Matrix(F))
end

When I sample with this model, I sometimes get warnings of numerical error, but these seem to only occur during warm-up, so no problems there.

joshualeond · August 3, 2022, 12:05pm

This is great and a really informative solution as well! Thanks Seth. If you don’t mind, a couple of follow up questions:

How did you notice this was happening during warmup and not during the real sampling of the target distribution?
If Stan possibly catches this error and returns an infinite log-density instead do you think Turing should also capture this case whenever F ~ LKJCholesky(n, eta) is officially implemented?

Thanks for the help

sethaxen · August 3, 2022, 8:29pm

Marking the log-density as Inf will be logged as a :numerical_error in the chains (any value that values the isfinite check for log density or its gradient will), so if we check a numerical error warning and yet see no numerical errors after sampling (with sum(chain_chol[:numerical_error])), then the divergences were during warm-up.

Hm, not certain. I mean, Turing could put a try...catch around all log-density evaluations and mark any errors as divergences. I’m guessing this would be closer to what Stan does. But I’m guessing this would be less efficient and also make debugging Turing models much harder. Another Turing dev might have a more useful answer though.

joshualeond · August 4, 2022, 12:10pm

Thanks again Seth, this is really helpful.

joshualeond · August 17, 2023, 1:31pm

Now that LKJCholesky is supported in Turing v0.28.2 (see here) I’m trying to adapt the example in this post to remove all of the TransformVariables and PDMats code with just the LKJCholesky distribution. Below is the code for using the LKJ which is hit or miss when sampling. Would you mind showing how to modify this to instead use LKJCholesky:

using Turing, Distributions, LinearAlgebra, Random

Random.seed!(123)
# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
        0.3 1 0.1;
        0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3
y = rand(MvNormal(Sigma), N)'

# model
@model correlation(J, N, y) = begin
    sigma ~ filldist(truncated(Cauchy(0., 5.), lower = 0), J) # prior on the standard deviations
    Omega ~ LKJ(J, 1.0) # LKJ prior on the correlation matrix

    Sigma = Symmetric(Diagonal(sigma) * Omega * Diagonal(sigma))

    for i in 1:N
        y[i,:] ~ MvNormal(Sigma) # sampling distribution of the observations
    end
    return Sigma
end

# 50/50 actually samples but often fails with ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
chain = sample(correlation(J, N, y), HMC(0.01, 5), 1000)
# fails more often than not with same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
chain = sample(correlation(J, N, y), NUTS(), 1000)

I’ve attempted modifying as follows but I’m fairly certain this isn’t correct:

@model correlation_chol(J, N, y) = begin
    sigma ~ filldist(truncated(Cauchy(0., 5.), lower = 0), J) # prior on the standard deviations
    Omega ~ LKJCholesky(J, 1.0)

    Sigma = Symmetric(Diagonal(sigma) * Omega.L)

    for i in 1:N
        y[i,:] ~ MvNormal(Sigma) # sampling distribution of the observations
    end
    return Sigma
end

# same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000)

devmotion · August 17, 2023, 8:27pm

Can’t test it right now but I assume the problem is the line

Sigma = Symmetric(Diagonal(sigma) * Omega.L)

This seems wrong as Diagonal(sigma) * Omega.L is only the lower-triangular factor of the Cholesky decomposition of the covariance matrix. So you rather want something like

Sigma =  PDMat(Cholesky(Omega.uplo == 'U' ? Omega.factors .* sigma' : sigma .* Omega.factors, Omega.uplo, 0))

(or something equivalent but possibly less efficient such as

Sigma = PDMat(Cholesky(sigma .* Omega.L, 'L', 0))

). Probably some more convenient constructor for PDMat should be added as suggested in Constructor for PDMat with Cholesky pre- and post- multiplied by diagonal matrix · Issue #132 · JuliaStats/PDMats.jl · GitHub.

joshualeond · August 17, 2023, 8:36pm

Thanks @devmotion!

Unfortunately the first change fails with the same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed. and the second option fails with a different error: ERROR: MethodError: Cannot convert an object of type Matrix{Float64} to an object of type LowerTriangular{Float64, Matrix{Float64}}

devmotion · August 17, 2023, 9:04pm

Ah, sorry, I had messed up the alternative suggestion. The edited version should not yield this MethodError.

By constructing the scaled Cholesky factorization and wrapping it in a PDMat, no additional Cholesky factorizations should be performed. Calling cholesky on the PDMat should just return the wrapped Cholesky factorization.

A possible problem could be the scaling with sigma: If it contains zeros, the resulting covariance matrix will be only positive semi-definite but not positive definite anymore.

joshualeond · August 18, 2023, 5:41pm

Oh okay, I tried with the edited version and now they both have the same PosDefException error. I also tried changing the truncated sigma’s to have a non-zero lower bound but still see the same errors. Should I add some jittering somewhere?

I’m just not sure if this is a user error on my end or if something’s missing in the implementation to make it comparable to Stan/PyMC. How would you suggest I move forward?

Thanks!

sethaxen · August 19, 2023, 11:01am

This implementation should work, but it fails with the same PosDefException.

using LinearAlgebra, PDMats, Random, Turing

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

@model function correlation_chol(J, N, y)
    sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
    F ~ LKJCholesky(J, 1.0)
    Σ_L = Diagonal(sigma .+ 1e-3) * F.L
    Sigma = PDMat(Cholesky(Σ_L))
    for i in 1:N
        y[i,:] ~ MvNormal(Sigma)
    end
    return Sigma
end

# same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000)

Here’s the full stracktrace.

Stacktrace

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
  [1] checkpositivedefinite
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/factorization.jl:67 [inlined]
  [2] cholesky!(A::Hermitian{Float64, Matrix{Float64}}, ::NoPivot; check::Bool)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:269
  [3] cholesky! (repeats 2 times)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:267 [inlined]
  [4] cholesky(A::Hermitian{Float64, Matrix{Float64}}, ::NoPivot; check::Bool)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:401
  [5] cholesky (repeats 2 times)
    @ Bijectors ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:401 [inlined]
  [6] cholesky_lower(X::Matrix{Float64})
    @ Bijectors ~/.julia/packages/Bijectors/fD1YU/src/utils.jl:31
  [7] transform(b::Bijectors.VecCholeskyBijector, X::Matrix{Float64})
    @ Bijectors ~/.julia/packages/Bijectors/fD1YU/src/bijectors/corr.jl:220
  [8] with_logabsdet_jacobian
    @ DynamicPPL ~/.julia/packages/Bijectors/fD1YU/src/bijectors/corr.jl:214 [inlined]
  [9] with_logabsdet_jacobian_and_reconstruct(f::Bijectors.VecCholeskyBijector, dist::LKJCholesky{…}, x::SubArray{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/abstract_varinfo.jl:565
 [10] _inner_transform!(vi::DynamicPPL.TypedVarInfo{…}, vn::AbstractPPL.VarName{…}, dist::LKJCholesky{…}, f::Bijectors.VecCholeskyBijector)
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:885
 [11] macro expansion
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:785 [inlined]
 [12] _link!(metadata::@NamedTuple{…}, vi::DynamicPPL.TypedVarInfo{…}, vns::@NamedTuple{…}, ::Val{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:771
 [13] _link!(metadata::@NamedTuple{…}, vi::DynamicPPL.TypedVarInfo{…}, vns::@NamedTuple{…}, ::Val{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:769 [inlined]
 [14] _link!(metadata::@NamedTuple{…}, vi::DynamicPPL.TypedVarInfo{…}, vns::@NamedTuple{…}, ::Val{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:765 [inlined]
 [15] link!!
    @ Turing.Inference ~/.julia/packages/DynamicPPL/W8pRQ/src/varinfo.jl:726 [inlined]
 [16] link!!
    @ Turing.Inference ~/.julia/packages/DynamicPPL/W8pRQ/src/abstract_varinfo.jl:383 [inlined]
 [17] initialstep(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}, vi::DynamicPPL.TypedVarInfo{…}; init_params::Nothing, nadapts::Int64, kwargs::@Kwargs{})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/hmc.jl:141
 [18] step(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}; resume_from::Nothing, init_params::Nothing, kwargs::@Kwargs{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/W8pRQ/src/sampler.jl:111
 [19] step
    @ AbstractMCMC ~/.julia/packages/DynamicPPL/W8pRQ/src/sampler.jl:84 [inlined]
 [20] macro expansion
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:125 [inlined]
 [21] macro expansion
    @ AbstractMCMC ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
 [22] (::AbstractMCMC.var"#21#22"{…})()
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:12
 [23] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:515
 [24] with_logger(f::Function, logger::LoggingExtras.TeeLogger{Tuple{…}})
    @ Base.CoreLogging ./logging.jl:627
 [25] with_progresslogger(f::Function, _module::Module, logger::Logging.ConsoleLogger)
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:36
 [26] macro expansion
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:11 [inlined]
 [27] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, kwargs::@Kwargs{…})
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:116
 [28] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; chain_type::Type, resume_from::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::@Kwargs{})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/hmc.jl:121
 [29] sample
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/hmc.jl:91 [inlined]
 [30] #sample#3
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:196 [inlined]
 [31] sample
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:189 [inlined]
 [32] #sample#2
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:186 [inlined]
 [33] sample(model::DynamicPPL.Model{…}, alg::NUTS{…}, N::Int64)
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:180
 [34] top-level scope
    @ REPL[28]:1
Some type information was truncated. Use `show(err)` to see complete types.

Since cholesky is not called in the constraining transform or in the model, the error must be happening in the _un_constraining transform. But since F is a Cholesky, I don’t see why a cholesky transform is being called at all. @torfjelde is there a missing overload for Cholesky or CholeskyVariate somewhere in DynamicPPL?

harisorgn · August 21, 2023, 10:40am

Since cholesky is not called in the constraining transform or in the model, the error must be happening in the _un_constraining transform.

I had a look too and this appears to be the case. The unconstrained transformed value of LKJCholeksy samples (a Vector) is stored in varinfo, but then this tranformation is called again during logp accumulation because the parameter is tagged as not transformed (istrans is False). This might have to do with how tilde_assume handles the sample. I’ll leave it to @torfjelde for now, he has a much better insight.

PatAsch · August 21, 2023, 11:15am

I think it might be related to https://github.com/TuringLang/Bijectors.jl/issues/279.

The NUTS kernel makes a large move and the factorization to the constrained domain fails. I think it is known, but to be honest it is a pretty big package that is managed by a very small amount of people, it is a tough job. I tried to solve it myself, but didn’t get it working properly either.

torfjelde · August 24, 2023, 11:09am

Hey! Sorry, am currently on vacation, so haven’t been active (just realized I’d forgotten to set my status on Github before leaving )

I’ll have a quick look at this now though as I have a few hours.

torfjelde · August 24, 2023, 3:02pm

I believe Fix for `LKJCholesky` by torfjelde · Pull Request #521 · TuringLang/DynamicPPL.jl · GitHub will fix this issue, but the example above still won’t work because you’ll run into an issue with ForwardDiff.jl and singular checks (see #481 possibly broke `ishermitian` · Issue #606 · JuliaDiff/ForwardDiff.jl · GitHub, which I believe is the same problem and it seems this will be the case going forward because it is technically correct). So if you try running the example with the branch in the forementioned PR, you’ll see something like:

julia> # same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
       chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000)
Sampling 100%|█████████████████████████████████████████████████████| Time: 0:00:07
ERROR: SingularException(2)
Stacktrace:
  [1] ldiv!(A::LowerTriangular{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}}}, b::Vector{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/triangular.jl:1077
  [2] \
    @ ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/triangular.jl:1426 [inlined]
  [3] invquad(a::PDMat{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}}}, x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}})
    @ PDMats ~/.julia/packages/PDMats/CbBv1/src/generics.jl:99
  [4] sqmahal(d::MvNormal{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, PDMat{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}}}, FillArrays.Zeros{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, 1, Tuple{Base.OneTo{Int64}}}}, x::SubArray{Float64, 1, Adjoint{Float64, Matrix{Float64}}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, false})
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariate/mvnormal.jl:267
  [5] _logpdf(d::MvNormal{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, PDMat{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}}}, FillArrays.Zeros{ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 6}, 1, Tuple{Base.OneTo{Int64}}}}, x::SubArray{Float64, 1, Adjoint{Float64, Matrix{Float64}}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, false})
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariate/mvnormal.jl:143
  [6] logpdf
    @ ~/.julia/packages/Distributions/Ufrz2/src/common.jl:250 [inlined]
...

joshualeond · August 24, 2023, 3:53pm

Thanks @torfjelde! So if I’m running into the singular checks then I can likely use a conditional statement that @sethaxen brought up in the earlier comments of this post and try to catch these and return an infinite log-density instead?

torfjelde · August 24, 2023, 4:18pm

Unfortunately, the singular checks above will always be triggered when using ForwardDiff.jl, I believe This is because of the “issue” I reference in the above comment.

You can instead use ReverseDiff or Zygote, which will avoid these issues, but then it seems we hit a dispatch which is ambiguous:

julia> Turing.setadbackend(:reversediff)
:reversediff

julia> # same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
       chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000)
Sampling 100%|█████████████████████████████████████████████████████| Time: 0:00:07
ERROR: MethodError: *(::Diagonal{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}}, ::ReverseDiff.TrackedArray{Float64, Float64, 2, Matrix{Float64}, Matrix{Float64}}) is ambiguous.

Candidates:
  *(x::AbstractMatrix, y::ReverseDiff.TrackedArray{V, D, 2}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractArray, y::ReverseDiff.TrackedArray{V, D, 2}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractMatrix, y::ReverseDiff.TrackedArray{V, D}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractArray, y::ReverseDiff.TrackedArray{V, D}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/linalg/arithmetic.jl:214
  *(D::Diagonal, A::AbstractMatrix)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:249
  *(A::AbstractMatrix, B::AbstractMatrix)
    @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:139

Possible fix, define
  *(::Diagonal, ::ReverseDiff.TrackedArray{V, D, 2}) where {V, D}

Stacktrace:
  [1] *(D::Diagonal{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}}, A::LowerTriangular{ReverseDiff.TrackedReal{Float64, Float64, ReverseDiff.TrackedArray{Float64, Float64, 2, Matrix{Float64}, Matrix{Float64}}}, ReverseDiff.TrackedArray{Float64, Float64, 2, Matrix{Float64}, Matrix{Float64}}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:544
...

and

julia> Turing.setadbackend(:zygote)
:zygote

julia> # same ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
       chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000)
Sampling 100%|█████████████████████████████████████████████████████| Time: 0:01:20
ERROR: MethodError: no method matching *(::ChainRulesCore.Tangent{Any, NamedTuple{(:data,), Tuple{LowerTriangular{Float64, Matrix{Float64}}}}}, ::UpperTriangular{Float64, Adjoint{Float64, Matrix{Float64}}})

Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...)
   @ Base operators.jl:578
  *(::Union{SparseArrays.AbstractSparseMatrixCSC{TA, Ti}, SubArray{TA, 2, <:SparseArrays.AbstractSparseMatrixCSC{TA, Ti}, Tuple{Base.Slice{Base.OneTo{Int64}}, I}} where I<:AbstractUnitRange} where Ti, ::Union{Adjoint{<:Any, <:Union{LowerTriangular, UnitLowerTriangular, UnitUpperTriangular, UpperTriangular, StridedMatrix, BitMatrix}}, LowerTriangular, Transpose{<:Any, <:Union{LowerTriangular, UnitLowerTriangular, UnitUpperTriangular, UpperTriangular, StridedMatrix, BitMatrix}}, UnitLowerTriangular, UnitUpperTriangular, UpperTriangular, StridedMatrix, BitMatrix}) where TA
   @ SparseArrays ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/SparseArrays/src/linalg.jl:52
  *(::Union{InitialValues.NonspecificInitialValue, InitialValues.SpecificInitialValue{typeof(*)}}, ::Any)
   @ InitialValues ~/.julia/packages/InitialValues/OWP8V/src/InitialValues.jl:154
  ...

Stacktrace:
  [1] (::ChainRules.var"#1472#1475"{ChainRulesCore.Tangent{Any, NamedTuple{(:data,), Tuple{LowerTriangular{Float64, Matrix{Float64}}}}}, LowerTriangular{Float64, Matrix{Float64}}, ChainRulesCore.ProjectTo{Diagonal, NamedTuple{(:diag,), Tuple{ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}})()
    @ ChainRules ~/.julia/packages/ChainRules/9sNmB/src/rulesets/Base/arraymath.jl:36
  [2] unthunk
    @ ~/.julia/packages/ChainRulesCore/0t04l/src/tangent_types/thunks.jl:204 [inlined]
  [3] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/4rucm/src/compiler/chainrules.jl:110 [inlined]
  [4] map
    @ ./tuple.jl:275 [inlined]
  [5] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/4rucm/src/compiler/chainrules.jl:111 [inlined]
  [6] (::Zygote.ZBack{ChainRules.var"#times_pullback#1474"{Diagonal{Float64, Vector{Float64}}, LowerTriangular{Float64, Matrix{Float64}}, ChainRulesCore.ProjectTo{LowerTriangular, NamedTuple{(:parent,), Tuple{ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}}}, ChainRulesCore.ProjectTo{Diagonal, NamedTuple{(:diag,), Tuple{ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}}})(dy::NamedTuple{(:data,), Tuple{LowerTriangular{Float64, Matrix{Float64}}}})
    @ Zygote ~/.julia/packages/Zygote/4rucm/src/compiler/chainrules.jl:211
...

These issues are fixable though, but are separate from the original issue in this thread (and not really Turing.jl’s “fault”).

sethaxen · August 24, 2023, 8:06pm

You can get around this by replacing

    Σ_L = Diagonal(sigma .+ 1e-3) * F.L
    Sigma = PDMat(Cholesky(Σ_L))

with

    Σ_L = Diagonal(sigma) * F.L
    Sigma = PDMat(Cholesky(Σ_L + eps() * I))

in the model in in Singular Exception with LKJCholesky - #11 by sethaxen

Then you get

julia> chain2 = sample(correlation_chol(J, N, y), NUTS(), 1000);
┌ Info: Found initial step size
└   ϵ = 0.0125
Sampling 100%|█████████████████████████████████████████████████████████████████████████| Time: 0:00:00
ERROR: MethodError: no method matching length(::Cholesky{Float64, Matrix{Float64}})

Closest candidates are:
  length(::RegexMatch)
   @ Base regex.jl:285
  length(::ExponentialBackOff)
   @ Base error.jl:267
  length(::Combinatorics.FixedPartitions)
   @ Combinatorics ~/.julia/packages/Combinatorics/Udg6X/src/partitions.jl:96
  ...

Stacktrace:
  [1] _length(a::Cholesky{Float64, Matrix{Float64}})
    @ Turing.Utilities ~/.julia/packages/Turing/aBDzv/src/utilities/helper.jl:9
  [2] length(iter::Turing.Utilities.FlattenIterator{Cholesky{Float64, Matrix{Float64}}})
    @ Turing.Utilities ~/.julia/packages/Turing/aBDzv/src/utilities/helper.jl:8
  [3] _similar_shape(itr::Turing.Utilities.FlattenIterator{Cholesky{Float64, Matrix{Float64}}}, ::Base.HasLength)
    @ Base ./array.jl:708
  [4] _collect(cont::UnitRange{…}, itr::Turing.Utilities.FlattenIterator{…}, ::Base.HasEltype, isz::Base.HasLength)
    @ Base ./array.jl:763
  [5] collect(itr::Turing.Utilities.FlattenIterator{Cholesky{Float64, Matrix{Float64}}})
    @ Base ./array.jl:757
  [6] (::Turing.Inference.var"#17#21")(::Tuple{Cholesky{Float64, Matrix{Float64}}, String})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:326
  [7] MappingRF
    @ Base ./reduce.jl:100 [inlined]
  [8] _foldl_impl
    @ Base ./reduce.jl:58 [inlined]
  [9] foldl_impl(op::Base.MappingRF{…}, nt::Base._InitialValue, itr::Base.Iterators.Zip{…})
    @ Base ./reduce.jl:48
 [10] mapfoldl_impl(f::Turing.Inference.var"#17#21", op::typeof(vcat), nt::Base._InitialValue, itr::Base.Iterators.Zip{…})
    @ Base ./reduce.jl:44
 [11] mapfoldl(f::Function, op::Function, itr::Base.Iterators.Zip{Tuple{Vector{…}, Vector{…}}}; init::Base._InitialValue)
    @ Base ./reduce.jl:175
 [12] mapfoldl
    @ Base ./reduce.jl:175 [inlined]
 [13] mapreduce(f::Function, op::Function, itr::Base.Iterators.Zip{Tuple{Vector{Cholesky{…}}, Vector{String}}})
    @ Base ./reduce.jl:307
 [14] #16
    @ Base ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:325 [inlined]
 [15] MappingRF
    @ Base ./reduce.jl:100 [inlined]
 [16] afoldl(::Base.MappingRF{Turing.Inference.var"#16#20"{…}, Base.BottomRF{…}}, ::Base._InitialValue, ::Symbol, ::Symbol)
    @ Base ./operators.jl:543
 [17] _foldl_impl
    @ Base ./reduce.jl:68 [inlined]
 [18] foldl_impl(op::Base.MappingRF{…}, nt::Base._InitialValue, itr::Tuple{…})
    @ Base ./reduce.jl:48
 [19] mapfoldl_impl(f::Turing.Inference.var"#16#20"{…}, op::typeof(vcat), nt::Base._InitialValue, itr::Tuple{…})
    @ Base ./reduce.jl:44
 [20] mapfoldl(f::Function, op::Function, itr::Tuple{Symbol, Symbol}; init::Base._InitialValue)
    @ Base ./reduce.jl:175
 [21] mapfoldl
    @ Base ./reduce.jl:175 [inlined]
 [22] mapreduce(f::Function, op::Function, itr::Tuple{Symbol, Symbol})
    @ Base ./reduce.jl:307
 [23] flatten_namedtuple(nt::@NamedTuple{sigma::Tuple{Vector{…}, Vector{…}}, F::Tuple{Vector{…}, Vector{…}}})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:320
 [24] (::Turing.Inference.var"#10#13"{…})(t::Turing.Inference.Transition{…})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:305
 [25] iterate(g::Base.Generator, s::Vararg{Any})
    @ Base ./generator.jl:47 [inlined]
 [26] _collect(c::Vector{…}, itr::Base.Generator{…}, ::Base.EltypeUnknown, isz::Base.HasShape{…})
    @ Base ./array.jl:852
 [27] collect_similar
    @ Turing.Inference ./array.jl:761 [inlined]
 [28] map
    @ Turing.Inference ./abstractarray.jl:3273 [inlined]
 [29] _params_to_array(ts::Vector{Turing.Inference.Transition{@NamedTuple{…}, Float64, @NamedTuple{…}}})
    @ Turing.Inference ~/.julia/packages/Turing/aBDzv/src/inference/Inference.jl:304
 [30] bundle_samples(ts::Vector{…}, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}, state::Turing.Inference.HMCState{…}, chain_type::Type{…}; save_state::Bool, stats::AbstractMCMC.SamplingStats, sort_chain::Bool, discard_initial::Int64, thinning::Int64, kwargs::@Kwargs{…})

I’m guessing this is an issue with flattening a Cholesky object for storage in a Chains, since if we use a different chain_type, all is well:

julia> chain3 = sample(correlation_chol(J, N, y), NUTS(), 1000; chain_type=VarInfo);
┌ Info: Found initial step size
└   ϵ = 0.00625
Sampling 100%|█████████████████████████████████████████████████████████████████████████| Time: 0:00:00

julia> map(t -> t.θ.F[1][1], chain3)[1]
Cholesky{Float64, Matrix{Float64}}
L factor:
3×3 LowerTriangular{Float64, Matrix{Float64}}:
 1.0         ⋅         ⋅ 
 0.539406   0.842046   ⋅ 
 0.295462  -0.139833  0.945066

joshualeond · August 30, 2023, 1:16pm

Nice! I tested on my end after @torfjelde’s PR went through and I have the same results as well.

Is the flattening issue something that has to be addressed in the MCMCChains.jl package?

torfjelde · August 30, 2023, 3:40pm

Not in MCMCChains, no. It will have to be fixed in Turing. I’ll have a look; shouldn’t be too difficult.

Topic		Replies	Views
Multi-level varying slopes with two clusters, cross classification Probabilistic Programming statistics , turing , distributions , bayesian-inference	16	2132	August 14, 2021
Turing: "no method matching cholenksy" Probabilistic Programming	2	964	September 29, 2019
2D state space model using Turing.jl General Usage	4	290	October 11, 2023
Hyper Priors in Turing Statistics package	5	1562	February 17, 2021
Slow Turing.jl sampling compared to python pymc Probabilistic Programming turing	7	130	June 26, 2025

Singular Exception with LKJCholesky

Related topics