Here’s how I would model this. The “magic” is to use p_skewed ~ Dirac(p*(0.5)+0.25)
which will place p_skewed
in the summary of the chains. Note that you can directly observe the value of yes_responses
by the line yes_reponses ~ Binomial(N,p_skewed)
.
using Turing
@model function cheating_model(yes_responses,N)
p ~ Uniform(0,1)
p_skewed ~ Dirac(p*(0.5)+0.25) # Use <var> ~ Dirac(<val>) to make it visible in the chain
yes_responses ~ Binomial(N,p_skewed) # Observe the yes_responses
end
yes_response = 35 # There are 35 yes responses
N = 100 # of total 100 responses
model = cheating_model(yes_response,N)
chns = sample(model, PG(5), 1_000)
display(chns)
# display(plot(chns))
Here’s the output of this model:
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
p 0.2072 0.0914 0.0009 0.0012 5132.6224 0.9999 3821.7590
p_skewed 0.3536 0.0457 0.0005 0.0006 5132.6224 0.9999 3821.7590
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
p 0.0414 0.1403 0.2045 0.2686 0.3943
p_skewed 0.2707 0.3201 0.3523 0.3843 0.4471
Below is an alternative model which - instead of the “Dirac trick” - use p_skewed
as a return value instead. This means that p_skewed
is not shown in the chains, so we have to use the generated_quantities
function to extract that information (and we must first weed out some of values to just get the return value). summarystats
is then used to print the statistics of the values.
@model function cheating_model2(yes_responses,N)
p ~ Uniform(0,1)
p_skewed = p*(0.5)+0.25 # This is not in the chains
yes_responses ~ Binomial(N,p_skewed) # Observe the yes_responses
return p_skewed
end
yes_response = 35
N = 100
model = cheating_model2(yes_response,N)
chns = sample(model, PG(5), 1_000)
display(chns)
# display(plot(chns))
# Filter out the relevant values to show the values of p_skewed
chains_params = Turing.MCMCChains.get_sections(chns, :parameters)
genq = generated_quantities(model, chains_params)
# Show some statistics of the return value
display(summarystats(vcat(genq...)))
Here’s the output of that model:
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
p 0.2187 0.0862 0.0027 0.0018 497.5426 0.9995 649.5334
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
p 0.0555 0.1619 0.2182 0.2724 0.4071
Summary Stats:
Length: 1000
Missing Count: 0
Mean: 0.359366
Minimum: 0.252196
1st Quartile: 0.330973
Median: 0.359121
3rd Quartile: 0.386188
Maximum: 0.571029
I personally prefer using Dirac
for this kind of problems, but some samplers have problems using it; the PG
sampler mostly works. If that fails then the “return trick” works.