I have a question about the Beta-Binomial Bayesian model. Let’s say I’d like to implement some quality control using this method. Based on a previous review of 10 widgets, I start with the prior belief that 90% of widgets are assembled correctly:
using Distributions
α_prior = 9
β_prior = 1
dist_prior = Beta(α_prior, β_prior)
I want to answer the following question: What is the probability that less than 90% of widgets are assembled correctly? Using my prior, I can do cdf(dist_prior, 0.9)
which yields a value of 0.387, so I say there’s a 39% chance that fewer than 90% of my widgets are assembled correctly. Now, I draw a sample of 100 widgets and discover that 90 of them were assembled correctly, and 10 of them were assembled incorrectly. I update my belief as follows:
α = α_prior + 90
β = β_prior + 10
dist = Beta(α, β)
This time, when I calculate cdf(dist, 0.9)
I get 0.466, so I believe there’s a 47% chance that fewer than 90% of my widgets are assembled correctly. Why would it be more likely that the true success rate is less than 90% when my sample size is much larger?
I get that the math works out that way and I sort of see why…as alpha and beta increase (but the ratio between them stays the same), the Beta distribution becomes more concentrated around the mean, so the area under the curve around the mean increases, but intuitively this doesn’t make sense. I would have expected to become more confident as I take larger and larger samples.
I must be missing something…?
EDIT: Okay, I think I see what’s going on here. When I compute the cumulative probability via cdf(dist, 0.9)
, for example, I’m computing the cumulative probability up to and including 0.9, so it’s always going to increase. If I use a value of 0.89, the output of cdf(dist, 0.89)
in fact decreases with larger sample sizes, as I expected. I guess I’ll leave this topic here in case I make the same silly mistake in the future and come here looking for answers