I’m having trouble with a number of models which all involve a mixture model with binary indicator variables. I’m reasonably sure they are all suffering from the same underlying issue, and I’m semi-convinced the problem is with the sampling (as opposed to model specification). I still do not have a good feel for which samplers to use in what situations. Any advice on the sampling, or the models, would be appreciated.
You can think of this as a bunch of test scores, out of 40. We want to know if people are performing at chance (ie did not study,
ψ = 0.5) or if they did study (
z=1). And if they did study, what is the performance of the study group
using Turing, StatsPlots n_chains = 8 k = [21 17 21 18 22 31 31 34 34 35 35 36 39 36 35] n = 40 @model function model(k, n) ψ = 0.5 ϕ ~ Uniform(0.5, 1) z ~ filldist(Bernoulli(0.5), length(k)) for i in eachindex(k) k[i] ~ Binomial(n, z[i] == 1 ? ϕ : ψ) end end chains = sample(model(k, n), PG(100, 1), MCMCThreads(), 5000, n_chains) # plot(chains) density(vec(chains[:ϕ]), xlim=(0, 1), lw=3, legend=false, xlabel="ϕ", ylabel="Posterior density")
It should be the case that
ϕ is highly peaked on about 0.86, and that the first 5 people did not study (z=0) and the rest did study (z=1). But the posterior over the indicator variables is not correct, and the resulting distribution of
ϕ is pretty off.