Beta-Binomial Bayesian Model - larger samples result in more uncertainty?

mthelm85 · January 3, 2024, 6:08pm

I have a question about the Beta-Binomial Bayesian model. Let’s say I’d like to implement some quality control using this method. Based on a previous review of 10 widgets, I start with the prior belief that 90% of widgets are assembled correctly:

using Distributions

α_prior = 9
β_prior = 1

dist_prior = Beta(α_prior, β_prior)

I want to answer the following question: What is the probability that less than 90% of widgets are assembled correctly? Using my prior, I can do cdf(dist_prior, 0.9) which yields a value of 0.387, so I say there’s a 39% chance that fewer than 90% of my widgets are assembled correctly. Now, I draw a sample of 100 widgets and discover that 90 of them were assembled correctly, and 10 of them were assembled incorrectly. I update my belief as follows:

α = α_prior + 90
β = β_prior + 10
dist = Beta(α, β)

This time, when I calculate cdf(dist, 0.9) I get 0.466, so I believe there’s a 47% chance that fewer than 90% of my widgets are assembled correctly. Why would it be more likely that the true success rate is less than 90% when my sample size is much larger?

I get that the math works out that way and I sort of see why…as alpha and beta increase (but the ratio between them stays the same), the Beta distribution becomes more concentrated around the mean, so the area under the curve around the mean increases, but intuitively this doesn’t make sense. I would have expected to become more confident as I take larger and larger samples.

I must be missing something…?

EDIT: Okay, I think I see what’s going on here. When I compute the cumulative probability via cdf(dist, 0.9), for example, I’m computing the cumulative probability up to and including 0.9, so it’s always going to increase. If I use a value of 0.89, the output of cdf(dist, 0.89) in fact decreases with larger sample sizes, as I expected. I guess I’ll leave this topic here in case I make the same silly mistake in the future and come here looking for answers

Dan · January 3, 2024, 6:43pm

But is Beta(9, 1) the prior you are looking for?
In the post:

but actually, the prior belief is that there is some p for which the widgets have p chance of being correctly assembled. And actually a reasonable representation of this prior belief would be a Beta(1,1) flat distribution. The first batch of items split 9-1, and so the posterior after the first batch is Beta(10,2).

This Beta(10,2) posterior behaves much more as expected, having a mode at 0.9 and going to zero at 1.0 (which Beta(9,1) doesn’t - counter to intuition from 9-1 split).

With this posterior as the prior for the next batch, which comes at 90-10, you get a posterior of Beta(100,12), which still has a mode at 0.9.

Topic		Replies	Views
What kind of distribution should I use for binary variable and others Probabilistic programming question	1	434	March 5, 2022
Turing.jl: unlikely posterior with bayesian differential equation? Probabilistic programming	9	509	April 4, 2023
Simple question about priors in bayesian analysis Probabilistic programming bayesian-inference	4	625	May 23, 2023
The cognitive bias of low-probability events Offtopic	5	534	July 29, 2020
Turing.jl: Confusion about random variables and draws from distributions Probabilistic programming turing	3	1029	May 28, 2021

Beta-Binomial Bayesian Model - larger samples result in more uncertainty?

Related topics