# Beta-Binomial Bayesian Model - larger samples result in more uncertainty?

I have a question about the Beta-Binomial Bayesian model. Let’s say I’d like to implement some quality control using this method. Based on a previous review of 10 widgets, I start with the prior belief that 90% of widgets are assembled correctly:

``````using Distributions

α_prior = 9
β_prior = 1

dist_prior = Beta(α_prior, β_prior)
``````

I want to answer the following question: What is the probability that less than 90% of widgets are assembled correctly? Using my prior, I can do `cdf(dist_prior, 0.9)` which yields a value of 0.387, so I say there’s a 39% chance that fewer than 90% of my widgets are assembled correctly. Now, I draw a sample of 100 widgets and discover that 90 of them were assembled correctly, and 10 of them were assembled incorrectly. I update my belief as follows:

``````α = α_prior + 90
β = β_prior + 10
dist = Beta(α, β)
``````

This time, when I calculate `cdf(dist, 0.9)` I get 0.466, so I believe there’s a 47% chance that fewer than 90% of my widgets are assembled correctly. Why would it be more likely that the true success rate is less than 90% when my sample size is much larger?

I get that the math works out that way and I sort of see why…as alpha and beta increase (but the ratio between them stays the same), the Beta distribution becomes more concentrated around the mean, so the area under the curve around the mean increases, but intuitively this doesn’t make sense. I would have expected to become more confident as I take larger and larger samples.

I must be missing something…?

EDIT: Okay, I think I see what’s going on here. When I compute the cumulative probability via `cdf(dist, 0.9)`, for example, I’m computing the cumulative probability up to and including 0.9, so it’s always going to increase. If I use a value of 0.89, the output of `cdf(dist, 0.89)` in fact decreases with larger sample sizes, as I expected. I guess I’ll leave this topic here in case I make the same silly mistake in the future and come here looking for answers

1 Like

But is `Beta(9, 1)` the prior you are looking for?
In the post:

but actually, the prior belief is that there is some `p` for which the widgets have `p` chance of being correctly assembled. And actually a reasonable representation of this prior belief would be a `Beta(1,1)` flat distribution. The first batch of items split 9-1, and so the posterior after the first batch is `Beta(10,2)`.

This `Beta(10,2)` posterior behaves much more as expected, having a mode at 0.9 and going to zero at 1.0 (which `Beta(9,1)` doesn’t - counter to intuition from 9-1 split).

With this posterior as the prior for the next batch, which comes at 90-10, you get a posterior of `Beta(100,12)`, which still has a mode at 0.9.

1 Like