I’m having trouble with a number of models which all involve a mixture model with binary indicator variables. I’m reasonably sure they are all suffering from the same underlying issue, and I’m semi-convinced the problem is with the sampling (as opposed to model specification). I still do not have a good feel for which samplers to use in what situations. Any advice on the sampling, or the models, would be appreciated.
Example 1
You can think of this as a bunch of test scores, out of 40. We want to know if people are performing at chance (ie did not study, z=0
and ψ = 0.5
) or if they did study (z=1
). And if they did study, what is the performance of the study group ϕ
?
using Turing, StatsPlots
n_chains = 8
k = [21 17 21 18 22 31 31 34 34 35 35 36 39 36 35]
n = 40
@model function model(k, n)
ψ = 0.5
ϕ ~ Uniform(0.5, 1)
z ~ filldist(Bernoulli(0.5), length(k))
for i in eachindex(k)
k[i] ~ Binomial(n, z[i] == 1 ? ϕ : ψ)
end
end
chains = sample(model(k, n), PG(100, 1), MCMCThreads(), 5000, n_chains)
# plot(chains)
density(vec(chains[:ϕ]), xlim=(0, 1), lw=3, legend=false, xlabel="ϕ", ylabel="Posterior density")
It should be the case that ϕ
is highly peaked on about 0.86, and that the first 5 people did not study (z=0) and the rest did study (z=1). But the posterior over the indicator variables is not correct, and the resulting distribution of ϕ
is pretty off.