Probabilistic Programming with Turing.jl - Question about example

I’m a beginner with Turing.jl/probabilistic programming/Bayesian stats and I’m not understanding a key piece of the introductory coin-flip example in the docs and I’m hoping someone can explain it to me.

I understand the first part, where they start with a prior distribution and then update it based on some data that is collected (without using Turing.jl):

Ns = 0:100

for (i, N) in enumerate(Ns)

    # Count the number of heads and tails.
    heads = sum(data[1:i-1])
    tails = N - heads
    
    # Update our prior belief in closed form (this is possible because we use a conjugate prior).
    updated_belief = Beta(prior_belief.α + heads, prior_belief.β + tails)

    # Plotting
    plot(updated_belief, 
        size = (500, 250), 
        title = "Updated belief after $$N observations",
        xlabel = "probability of heads", 
        ylabel = "", 
        legend = nothing,
        xlim = (0,1),
        fill=0, α=0.3, w=3)
    vline!([p_true])
end

This idea of starting with a prior distribution Beta(1,1) and then updating it as you observe data makes perfect sense to me. However, when we get to the same example implemented with Turing.jl, I’m not understanding what is going on inside the for loop, and where the observed data comes into play.

using Turing

@model coinflip(y) = begin
    
    # Our prior belief about the probability of heads in a coin.
    p ~ Beta(1, 1)
    
    # The number of observations.
    N = length(y)
    for n in 1:N
        # Heads or tails of a coin are drawn from a Bernoulli distribution.
        y[n] ~ Bernoulli(p)
    end
end

As I understand this, we are defining a model coinflip that takes some observed data as an argument. Inside of this model we start with the same prior as before where we believe that the parameter p, which is the probability of achieving ‘heads’ on any given flip of the coin, is distributed as Beta(1,1). Then, for every observation of our data, we are drawing from the Bernoulli distribution (??) and then…??

I think what I need to know is, what exactly is y[n] ~ Bernoulli(p)? Is there an equivalent in “plain vanilla” Julia code?

Are we replacing each value in y with a randomly drawn value from the Bernoulli distribution? If this is the case, why would we even need the data in the first place?

2 Likes

The syntax y ~ Bernoulli(p) means the likelihood of seeing the value y is being computed assuming that it was drawn from a Bernoulli with the sampled version of p. The value of y is not actually sampled because y is being observed from your data. p, on the other hand is actually sampled from a Beta(1,1). What MCMC samplers do is to compute the (log) likelihood for each of your data samples and add them all up for a total likelihood of the data, giving you an estimate of the parameter(s) of your posterior.

1 Like

Thanks, @dalejordan. So what you’re saying is that on each iteration of the loop, with y[n] ~ Bournoulli(p) we are computing the likelihood, as in loglikelihood(Bernoulli(p), [y[n]]) (the loglikelihood function from Distributions.jl)?

In Turing.jl each y[k] ~ Bernoulli(p) statement will be translated into a logpdf(Bernoulli(p), y[k]) statement. For MCMC samplers the resulting logpdf values will be accumulated, for variational Bayes the behaviour is different. This is only true if y[k] is an observation.

3 Likes

Thank you! :smiley: