Fitting a observed value to a Binomial Distribution Turing

I am new to Turing and trying to learn it by trying to replicate the Chapters from the book :- https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers .
Here is my problem (from Chapter 2) :-

1.) Have a probability p ~ Uniform(0,1) which I have to infer the “number_of_cheaters”
2.) p_skewed = p*(0.5) + (0.25) (deterministic value)
3.) with model:
yes_responses = pm.Binomial(“number_of_cheaters”, 100, p_skewed, observed=35)

How do I write this in Turing ?

This is how far I have come up :-

@model function model(yes_responses,N)
p ~ Uniform(0,1)
p_skewed = p*(0.5)+0.25
rv_yes_responses ~ Binomial(N,p_skewed)

// This line on how to include yes_responses i am unable to figure out

return p_skewed

end

Here’s how I would model this. The “magic” is to use p_skewed ~ Dirac(p*(0.5)+0.25) which will place p_skewed in the summary of the chains. Note that you can directly observe the value of yes_responses by the line yes_reponses ~ Binomial(N,p_skewed).

using Turing

@model function cheating_model(yes_responses,N)
    p ~ Uniform(0,1)
    p_skewed ~ Dirac(p*(0.5)+0.25)               # Use  <var> ~  Dirac(<val>) to make it visible in the chain
    yes_responses ~ Binomial(N,p_skewed)  # Observe the yes_responses
end

yes_response = 35 # There are 35 yes responses
N = 100                   #  of total 100 responses
model = cheating_model(yes_response,N)

chns = sample(model, PG(5),  1_000)

display(chns)
# display(plot(chns))

Here’s the output of this model:

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec 
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64 

           p    0.2072    0.0914     0.0009    0.0012   5132.6224    0.9999     3821.7590
    p_skewed    0.3536    0.0457     0.0005    0.0006   5132.6224    0.9999     3821.7590

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           p    0.0414    0.1403    0.2045    0.2686    0.3943
    p_skewed    0.2707    0.3201    0.3523    0.3843    0.4471

Below is an alternative model which - instead of the “Dirac trick” - use p_skewed as a return value instead. This means that p_skewed is not shown in the chains, so we have to use the generated_quantities function to extract that information (and we must first weed out some of values to just get the return value). summarystats is then used to print the statistics of the values.

@model function cheating_model2(yes_responses,N)
    p ~ Uniform(0,1)
    p_skewed = p*(0.5)+0.25                          # This is not in the chains
    yes_responses ~ Binomial(N,p_skewed)  # Observe the yes_responses

    return p_skewed
end

yes_response = 35
N = 100
model = cheating_model2(yes_response,N)

chns = sample(model, PG(5),  1_000)
display(chns)

# display(plot(chns))

# Filter out the relevant values to show the values of p_skewed
chains_params = Turing.MCMCChains.get_sections(chns, :parameters)
genq = generated_quantities(model, chains_params)
# Show some statistics of the return value
display(summarystats(vcat(genq...)))

Here’s the output of that model:

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   ess_per_sec 
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64       Float64 

           p    0.2187    0.0862     0.0027    0.0018   497.5426    0.9995      649.5334

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           p    0.0555    0.1619    0.2182    0.2724    0.4071

Summary Stats:
Length:         1000
Missing Count:  0
Mean:           0.359366
Minimum:        0.252196
1st Quartile:   0.330973
Median:         0.359121
3rd Quartile:   0.386188
Maximum:        0.571029

I personally prefer using Dirac for this kind of problems, but some samplers have problems using it; the PG sampler mostly works. If that fails then the “return trick” works.

1 Like

Thank you so much. I have used the second solution actually. But the first one is actually interesting !