Please forgive my naïveté (I don’t have any formal training in probabilistic programming/Bayesian statistics) but I’m trying to understand something about the coin flipping example in the Turing.jl docs.
Here’s a bare-bones version of their example:
using Distributions
using StatsPlots
using Turing
using MCMCChains
using Random
Random.seed!(12)
data = rand(Bernoulli(0.5), 100)
@model coinflip(y) = begin
p ~ Beta(1, 1)
N = length(y)
for n in 1:N
y[n] ~ Bernoulli(p)
end
end
iterations = 1000
ϵ = 0.05
τ = 10
chain = sample(coinflip(data), HMC(ϵ, τ), iterations)
This yields the following:
julia> p_summary = chain[:p]
Object of type Chains, with data of type 1000×1×1 Array{Union{Missing, Real},3}
Iterations = 1:1000
Thinning interval = 1
Chains = 1
Samples per chain = 1000
parameters = p
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼──────────┼───────────┼────────────┼─────────────┼─────────┼───────┤
│ 1 │ p │ 0.530459 │ 0.0554448 │ 0.00175332 │ 0.000617709 │ 1316.33 │ 0.999 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1 │ p │ 0.422654 │ 0.49629 │ 0.530401 │ 0.566413 │ 0.635665 │
And the plot looks like this:
p = plot(p_summary, seriestype = :density, xlim = (0,1), legend = :best, w = 2, c = :blue)
This is all well and fine and I understand what’s going on. However, if I start to increase the length of the data array, things start to go haywire (in other words, I stop understanding what’s going on )
Changing the length of the data array from 100 to 100,000 (data = rand(Bernoulli(0.5), 100)
becomes data = rand(Bernoulli(0.5), 100_000)
yields the following:
julia> p_summary = chain[:p]
.
.
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼──────────┼─────────────┼────────────┼─────────┼─────────┼─────────┤
│ 1 │ p │ 0.187243 │ 5.55389e-17 │ 1.7563e-18 │ 0.0 │ 4.01606 │ 1.00289 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 1 │ p │ 0.187243 │ 0.187243 │ 0.187243 │ 0.187243 │ 0.187243 │
And the plot is just blank since the value of p
is constant. Furthermore, if I run this same code several times, I get wildly different results for p
.
What’s going on here? I thought that as the amount of data increased, the posterior distribution would simply become taller/skinnier around the true value of 0.5. However, the model does a much better job of estimating p
with 100 data points than it does with 100,000…???