Thanks @sethaxen @michielver @bdeonovic. I spent the whole weekend tweaking like 15 variants of the models but nothing worked. I got stucked in this part of the chapter for so many hours. Finally I can get my model working! I got similar sampling time of about 12 minutes compared to Stan’s about 2 minutes. At least now the efficiency is of the same order of magnitude. I learned many things from you guys! Particularly,
using ReverseDiff, Memoization
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)
α ~ filldist(MvNormal(zeros(4), Sigma_actor), 7)
# alternatively
α ~ filldist(MvNormal(Sigma_actor), 7)
Thanks @sethaxes for the comprehensive explanation. I noted all of your comments, including the priors and adapt_delta
. I did play around using Turing’s default adapt_delta = 0.65
and it does speed up by about 2 minutes. Furthermore, it seems
quad_form_diag(M, v) = Symmetric((v .* v') .* (M .+ M') ./ 2)
also helps to speed things a lot though I have no idea this line does. I just copied and pasted. And based on your advice
computation of logit_p allocates 4 vectors, but this could be done with just one allocation.
I understand @inbounds
but also not sure what this line does. Again, I copied pasted into my model and it works. I need to read up some Julia docs.
logit_p = broadcast(treatment, actor, block) do t, a, b
return @inbounds g[t] + α[t, a] + β[t, b]
end
I am also constructing another non-centered version of the model above, using cholesky factorisation to build up the variance-covariance matrix. Some summary of the data and key parts of the models I am trying to implement – some parts of the full model specified below
# Summary of key parts of Turing model. cannot be run
n_treatment, n_actor, n_block = 4, 7, 6
# adaptive NON-CENTERED priors
ρ_actor ~ LKJ(n_treatment, 2)
σ_actor ~ filldist(Exponential(1), n_treatment)
# I have no idea what does these two lines do. I just copied from the model of TuringModels.jl above
ρ_actor = (ρ_actor' + ρ_actor) / 2
# Cholesky factor
L_actor = cholesky(ρ_actor).L
# Standardised prior
z_actor ~ filldist(Normal(0, 1), n_treatment, n_actor)
α = σ_actor .* L_actor * z_actor # matrix 4 x 7
# The linear model
α[t,a] + # more terms not shown
# likelihood
pulled_left .~ BinomialLogit.(1, logit_p)
The Stan model, sampling 4 chains, finished in about 6 seconds per chain (total = Warm-up + sampling). Summarized elapsed time
Chain 1: Elapsed Time: 3.27789 seconds (Warm-up)
Chain 1: 2.63135 seconds (Sampling)
Chain 1: 5.90924 seconds (Total)
Chain 2: Elapsed Time: 3.27674 seconds (Warm-up)
Chain 2: 2.77611 seconds (Sampling)
Chain 2: 6.05285 seconds (Total)
Chain 3: Elapsed Time: 3.44 seconds (Warm-up)
Chain 3: 2.66887 seconds (Sampling)
Chain 3: 6.10887 seconds (Total)
Chain 4: Elapsed Time: 3.4264 seconds (Warm-up)
Chain 4: 2.63844 seconds (Sampling)
Chain 4: 6.06484 seconds (Total)
Stan model
Stan code saved in “nonCentered.stan”
data{
int L[504];
int block_id[504];
int actor[504];
int tid[504];
}
parameters{
matrix[4,7] z_actor;
matrix[4,6] z_block;
vector[4] g;
cholesky_factor_corr[4] L_Rho_actor;
cholesky_factor_corr[4] L_Rho_block;
vector<lower=0>[4] sigma_actor;
vector<lower=0>[4] sigma_block;
}
transformed parameters{
matrix[7,4] alpha;
matrix[6,4] beta;
beta = (diag_pre_multiply(sigma_block, L_Rho_block) * z_block)';
alpha = (diag_pre_multiply(sigma_actor, L_Rho_actor) * z_actor)';
}
model{
vector[504] p;
sigma_block ~ exponential( 1 );
sigma_actor ~ exponential( 1 );
L_Rho_block ~ lkj_corr_cholesky( 2 );
L_Rho_actor ~ lkj_corr_cholesky( 2 );
g ~ normal( 0 , 1 );
to_vector( z_block ) ~ normal( 0 , 1 );
to_vector( z_actor ) ~ normal( 0 , 1 );
for ( i in 1:504 ) {
p[i] = g[tid[i]] + alpha[actor[i], tid[i]] + beta[block_id[i], tid[i]];
p[i] = inv_logit(p[i]);
}
L ~ binomial( 1 , p );
}
generated quantities{
vector[504] log_lik;
vector[504] p;
matrix[4,4] Rho_actor;
matrix[4,4] Rho_block;
Rho_block = multiply_lower_tri_self_transpose(L_Rho_block);
Rho_actor = multiply_lower_tri_self_transpose(L_Rho_actor);
for ( i in 1:504 ) {
p[i] = g[tid[i]] + alpha[actor[i], tid[i]] + beta[block_id[i], tid[i]];
p[i] = inv_logit(p[i]);
}
for ( i in 1:504 ) log_lik[i] = binomial_lpmf( L[i] | 1 , p[i] );
}
Sampling Stan model
library(rethinking)
data(chimpanzees)
d = chimpanzees
d$block_id = d$block
d$treatment = 1L + d$prosoc_left + 2L*d$condition
dat = list(
L = d$pulled_left,
tid = d$treatment,
actor = d$actor,
block_id = as.integer(d$block_id) )
nonCentered = stan_model("nonCentered.stan")
fit = sampling(nonCentered, data = dat, seed = 803214053, control = list(adapt_delta = 0.9))
fit = sampling(nonCentered, data = dat, seed = 803214053, control = list(adapt_delta = 0.9))
SAMPLING FOR MODEL 'nonCentered' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.000133 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.33 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 3.27789 seconds (Warm-up)
Chain 1: 2.63135 seconds (Sampling)
Chain 1: 5.90924 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'nonCentered' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 9.7e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.97 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 3.27674 seconds (Warm-up)
Chain 2: 2.77611 seconds (Sampling)
Chain 2: 6.05285 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'nonCentered' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 9.7e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.97 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 3.44 seconds (Warm-up)
Chain 3: 2.66887 seconds (Sampling)
Chain 3: 6.10887 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'nonCentered' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0.000111 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.11 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 3.4264 seconds (Warm-up)
Chain 4: 2.63844 seconds (Sampling)
Chain 4: 6.06484 seconds (Total)
Chain 4:
Turing’s reference model
To implement it in Turing, I found a similar model here StatisticalRethinkingJulia/TuringModels.jl - non-centered chimpanzees.md.
using TuringModels
using LinearAlgebra
# This script requires latest LKJ bijectors support.
# `] add Bijectors#master` to get latest Bijectors.
data_path = joinpath(@__DIR__, "..", "..", "data", "chimpanzees.csv")
delim = ";"
d = CSV.read(data_path, DataFrame; delim)
d.block_id = d.block
# m13.6nc1 is equivalent to m13.6nc in following Turing model
@model m13_6_nc(actor, block_id, condition, prosoc_left, pulled_left) = begin
# fixed priors
Rho_block ~ LKJ(3, 4.)
Rho_actor ~ LKJ(3, 4.)
sigma_block ~ filldist(truncated(Cauchy(0, 2), 0, Inf), 3)
sigma_actor ~ filldist(truncated(Cauchy(0, 2), 0, Inf), 3)
a ~ Normal(0, 1)
bp ~ Normal(0, 1)
bpc ~ Normal(0, 1)
# adaptive NON-CENTERED priors
Rho_block = (Rho_block' + Rho_block) / 2
Rho_actor = (Rho_actor' + Rho_actor) / 2
L_Rho_block = cholesky(Rho_block).L
L_Rho_actor = cholesky(Rho_actor).L
z_N_block ~ filldist(Normal(0, 1), 3, 6)
z_N_actor ~ filldist(Normal(0, 1), 3, 7)
# @show size(L_Rho_block) size(sigma_block) size(z_N_block_id)
a_block_bp_block_bpc_block = sigma_block .* L_Rho_block * z_N_block
a_actor_bp_actor_bpc_actor = sigma_actor .* L_Rho_actor * z_N_actor
a_block = a_block_bp_block_bpc_block[1, :]
bp_block = a_block_bp_block_bpc_block[2, :]
bpc_block = a_block_bp_block_bpc_block[3, :]
a_actor = a_actor_bp_actor_bpc_actor[1, :]
bp_actor = a_actor_bp_actor_bpc_actor[2, :]
bpc_actor = a_actor_bp_actor_bpc_actor[3, :]
# linear models
BPC = bpc .+ bpc_actor[actor] + bpc_block[block_id]
BP = bp .+ bp_actor[actor] + bp_block[block_id]
A = a .+ a_actor[actor] + a_block[block_id]
logit_p = A + (BP + BPC .* condition) .* prosoc_left
# likelihood
pulled_left .~ BinomialLogit.(1, logit_p)
end
chns = sample(
m13_6_nc(d.actor, d.block_id, d.condition, d.prosoc_left, d.pulled_left),
Turing.NUTS(0.95),
1000
)
The model I implemented to replicate Stan’s model above
To implement the Stan’s model and when doing this, I also refer to @sethaxen 's implementation above and the model of TuringModel.jl
. Again, I coded up like more than 10 variants, but same issues
using TuringModels, CSV, DataFrames, Turing, LinearAlgebra, Random, ReverseDiff, Memoization
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)
data_path = joinpath(TuringModels.project_root, "data", "chimpanzees.csv");
df = CSV.read(data_path, DataFrame; delim=';');
df.block_id = df.block;
df.treatment = 1 .+ df.prosoc_left .+ 2*df.condition;
@model function nonCentered(actor, block, treatment, pulled_left)
n_treatment, n_actor, n_block = 4, 7, 6
# fixed priors
g ~ MvNormal(n_treatment, 1)
ρ_block ~ LKJ(n_treatment, 2)
ρ_actor ~ LKJ(n_treatment, 2)
σ_block ~ filldist(Exponential(1), n_treatment)
σ_actor ~ filldist(Exponential(1), n_treatment)
# adaptive NON-CENTERED priors
# I have no idea what does these two lines do. I just copied from the model of TuringModels.jl above
ρ_block = (ρ_block' + ρ_block) / 2
ρ_actor = (ρ_actor' + ρ_actor) / 2
# Cholesky factor
L_block = cholesky(ρ_block).L
L_actor = cholesky(ρ_actor).L
# Standardised prior
z_block ~ filldist(Normal(0, 1), n_treatment, n_block)
z_actor ~ filldist(Normal(0, 1), n_treatment, n_actor)
# Again, copied from the model of TuringModels.jl above
β = σ_block .* L_block * z_block # matrix 4 x 6
α = σ_actor .* L_actor * z_actor # matrix 4 x 7
# Use the code from @sethaxen
logit_p = broadcast(treatment, actor, block) do t, a, b
return @inbounds g[t] + α[t,a] + β[t,b]
end
pulled_left .~ BinomialLogit.(1, logit_p)
end
rng = MersenneTwister(3702)
@time chns = sample(
rng,
nonCentered(df.actor, df.block_id, df.treatment, df.pulled_left),
NUTS(), # Use Turing's default settings
1_000
);
julia> @time chns = sample(
rng,
nonCentered(df.actor, df.block_id, df.treatment, df.pulled_left),
NUTS(), # Use Turing's default settings
1_000
);
┌ Info: Found initial step size
└ ϵ = 0.2
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/MIxdK/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/MIxdK/src/hamiltonian.jl:47
Sampling 100%|██████████████████████████████████| Time: 0:01:30
ERROR: DomainError with -5.831779503751022e-11:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
[1] sqrt
@ ./math.jl:582 [inlined]
[2] sqrt
@ ~/.julia/packages/ForwardDiff/QOqCN/src/dual.jl:203 [inlined]
[3] derivative!
@ ~/.julia/packages/ForwardDiff/QOqCN/src/derivative.jl:46 [inlined]
[4] unary_scalar_forward_exec!(f::typeof(sqrt), output::ReverseDiff.TrackedReal{Float64, Float64, Nothing}, input::ReverseDiff.TrackedReal{Float64, Float64, Nothing}, cache::Base.RefValue{Float64})
@ ReverseDiff ~/.julia/packages/ReverseDiff/E4Tzn/src/derivatives/scalars.jl:91
[5] scalar_forward_exec!(instruction::ReverseDiff.ScalarInstruction{typeof(sqrt), ReverseDiff.TrackedReal{Float64, Float64, Nothing}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, Base.RefValue{Float64}})
@ ReverseDiff ~/.julia/packages/ReverseDiff/E4Tzn/src/derivatives/scalars.jl:81
[6] forward_exec!(instruction::ReverseDiff.ScalarInstruction{typeof(sqrt), ReverseDiff.TrackedReal{Float64, Float64, Nothing}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, Base.RefValue{Float64}})
@ ReverseDiff ~/.julia/packages/ReverseDiff/E4Tzn/src/tape.jl:82
[7] ForwardExecutor
@ ~/.julia/packages/ReverseDiff/E4Tzn/src/api/tape.jl:76 [inlined]
[8] (::FunctionWrappers.CallWrapper{Nothing})(f::ReverseDiff.ForwardExecutor{ReverseDiff.ScalarInstruction{typeof(sqrt), ReverseDiff.TrackedReal{Float64, Float64, Nothing}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, Base.RefValue{Float64}}})
@ FunctionWrappers ~/.julia/packages/FunctionWrappers/8xdVB/src/FunctionWrappers.jl:58
[9] macro expansion
@ ~/.julia/packages/FunctionWrappers/8xdVB/src/FunctionWrappers.jl:130 [inlined]
[10] do_ccall
@ ~/.julia/packages/FunctionWrappers/8xdVB/src/FunctionWrappers.jl:118 [inlined]
[11] FunctionWrapper
@ ~/.julia/packages/FunctionWrappers/8xdVB/src/FunctionWrappers.jl:137 [inlined]
[12] forward_pass!(compiled_tape::ReverseDiff.CompiledTape{ReverseDiff.GradientTape{Turing.Core.var"#f#26"{DynamicPPL.TypedVarInfo{NamedTuple{(:g, :ρ_block, :ρ_actor, :σ_block, :σ_actor, :z_block, :z_actor), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:g, Tuple{}}, Int64}, Vector{
I noted a few things, which might cause the issue, comparing Stan and my model implemented in Turing
Stan has lkj_corr_cholesky
Stan’s implementation of lkj_corr_cholesky
As per 24 Correlation Matrix Distributions | Stan Functions Reference
The correlation matrix distributions have support on the (Cholesky factors of) correlation matrices. A Cholesky factor L for a K × K correlation matrix Σ of dimension K has rows of unit length so that the diagonal of L L ⊤ is the unit K -vector. Even though models are usually conceptualized in terms of correlation matrices, it is better to operationalize them in terms of their Cholesky factors.
As per 24.1 LKJ Correlation Distribution | Stan Functions Reference, with reference to LKJ
distribution (without cholesky factorisation),
However, it is much better computationally to work directly with the Cholesky factor of Σ, so this distribution should never be explicitly used in practice.
As per 24.2 Cholesky LKJ Correlation Distribution | Stan Functions Reference
Stan provides an implicit parameterization of the LKJ correlation matrix density in terms of its Cholesky factor, which you should use rather than the explicit parameterization in the previous section. For example, if L is a Cholesky factor of a correlation matrix, then
L ~ lkj_corr_cholesky(2.0); # implies L * L' ~ lkj_corr(2.0);
parameters{
.............
cholesky_factor_corr[4] L_Rho_block;
}
model{
...............
L_Rho_actor ~ lkj_corr_cholesky( 2 );
}
Compared to Turing’s
ρ_actor ~ LKJ(n_treatment, 2)
Building the covariance matrix
I am not sure if my implementation is correct. This is STan’s version
transformed parameters{
............
alpha = (diag_pre_multiply(sigma_actor, L_Rho_actor) * z_actor)';
}
Compared to my Turing’s
# I have no idea what does these two lines do. I just copied from the model of TuringModels.jl above
ρ_actor = (ρ_actor' + ρ_actor) / 2
# Cholesky factor
L_actor = cholesky(ρ_actor).L
# Standardised prior
z_actor ~ filldist(Normal(0, 1), n_treatment, n_actor)
α = σ_actor .* L_actor * z_actor # matrix 4 x 7
Final attempt before I made this post
Given the points above, I suspect the lkj_corr_cholesky
is the issue. There is no such thing in Julia or Turing?
So I bumped up the prior of LKJ to be very restrictive but it seems very wrong thing to do. So I changed the priors of LKJ from these
ρ_block ~ LKJ(n_treatment, 2)
ρ_actor ~ LKJ(n_treatment, 2)
to these
ρ_block ~ LKJ(n_treatment, 1_000)
ρ_actor ~ LKJ(n_treatment, 1_000)
then sample again
julia> @time chns = sample(
rng,
nonCentered(df.actor, df.block_id, df.treatment, df.pulled_left),
NUTS(), # Use Turing's default settings
1_000
);
┌ Info: Found initial step size
└ ϵ = 0.025
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/MIxdK/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/MIxdK/src/hamiltonian.jl:47
Sampling 100%|██████████████████████████████████| Time: 0:07:48
482.639091 seconds (161.88 M allocations: 30.543 GiB, 1.65% gc time, 4.60% compilation time)
Then the sampling works but i have not validated the inferences. But changing the η
from 2 to 1000 just does not seem the right thing to do. Comparing this to Stan’s efficiency (about 6 seconds per chain), here the sampling of 1 chain takes about 7 minutes (although not directly comparable, it’s still many order of magnitues). And I have no idea why changing the η makes the sampling works. Does anybody have any idea?
LKJ Cholesky
Does julia or Turing has implementation of this? I could not find any. Both Stan and PyMC3 have implementations
Stan 24.2 Cholesky LKJ Correlation Distribution | Stan Functions Reference
PyMC3: LKJ Cholesky Covariance Priors for Multivariate Normal Models — PyMC3 3.10.0 documentation
I see many similar Turing issues might be related to this for example
Tracker does not work with multilevel model · Issue #1549 · TuringLang/Turing.jl
(Generalized) Linear Mixed Model Tutorial · Issue #1518 · TuringLang/Turing.jl
How to model Covariance matrices? · Discussion #1526 · TuringLang/Turing.jl
Without LKJ Cholesky it seems to be very hard to do model covariance matrix for multivariate distribution in Turing.
So while waiting for julia or Turing to implement something like LKJ_Cholesky
, is there any workaround to sample the the model above? Can I call Stan’s lkj_corr_cholesky
from Turing?