New Bayesian Statistics tutorials with Turing

I’ve just made some Bayesian Statistics tutorials using Julia and Turing. The content is fully opensourced in GitHub and has a very permissive Creative Commons License (which means you can copy and alter the crap out it…).

My hope is to have more people into Bayesian statistics. The content is aimed towards social scientists and PhD candidates in social sciences. I chose to provide an intuitive approach rather than focusing on rigorous mathematical formulations. I’ve made it to be how I would have liked to be introduced to Bayesian statistics.

Check it out in: Bayesian Statistics using Julia and Turing or in GitHub - storopoli/Bayesian-Julia: Bayesian Statistics using Julia and Turing

36 Likes

Brilliant. Thanks for this. Turing is ideal for learning bayesian stats and probabilistic programming. I love that (much like Julia) it has a high level, easy syntax but you can dig deeper to access greater complexity and performance when needed.

1 Like

I looks very interesting, I’ll read it when I have some time,
Thank you.

I think you missed the “beta distribution”, very common in Bayesian Statistics.

And I have a quick question…
Most frequentist statistics only use the mean and the variance of the sample.
What about bayesian statistics models? Do they use just the mean and the variance, or do they also make use of all the information of all the measures, i.e. all the moments?

We mostly use the full posterior density. And then you can calculate any moment or any other statistic you might fancy.

1 Like

This looks really interesting and I’ll definitively will work through the material in the next days.

@Storopoli, quick question: in How to use Turing there’s this model:

@model dice_throw(y) = begin
    #Our prior belief about the probability of each result in a six-sided dice.
    #p is a vector of length 6 each with probability p that sums up to 1.
    p ~ Dirichlet(6, 1)

    #Each outcome of the six-sided dice has a probability p.
    for i in eachindex(y)
        y[i] ~ Categorical(p)
    end
end;

And then it says:

Instead, I’ve opted for a Dirichlet with a weekly informative prior towards a “fair” dice which is encoded as a Dirichlet(6,1).

  • What’s a “weekly informative prior towards a “fair” dice”?
  • Why do we choose Dirichlet(6, 1) and not, say, Dirichlet(6, 8)? As far as I can tell, all of them have mean 0.166667:
all(
	all(mean(Dirichlet(6, a)) .≈ 1/6)
	for a ∈ 0.01:.001:30
) == true

This is going to be philosophical question? I don’t know, i think a 6-length vector of 1/6?

Good question. I honestly don’t know. Looking the help:

help?> Dirichlet
search: Dirichlet DirichletMultinomial

  Dirichlet

  The Dirichlet distribution (http://en.wikipedia.org/wiki/Dirichlet_distribution) is often used as the conjugate prior for Categorical
  or Multinomial distributions. The probability density function of a Dirichlet distribution with parameter \alpha = (\alpha_1, \ldots,
  \alpha_k) is:

  f(x; \alpha) = \frac{1}{B(\alpha)} \prod_{i=1}^k x_i^{\alpha_i - 1}, \quad \text{ with }
B(\alpha) = \frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma \left( \sum_{i=1}^k \alpha_i \right)},
\quad x_1 + \cdots + x_k = 1

  # Let alpha be a vector
  Dirichlet(alpha)         # Dirichlet distribution with parameter vector alpha
  
  # Let a be a positive scalar
  Dirichlet(k, a)          # Dirichlet distribution with parameter a * ones(k)

and then:

julia> mean(Dirichlet(6,1))
6-element Fill{Float64}: entries equal to 0.16666666666666666

julia> mean(Dirichlet(6,8))
6-element Fill{Float64}: entries equal to 0.16666666666666666

Dirichlet(6,8) implies you are quite confident that it’s a fair dice roll. Whereas (6,1) says “I don’t really know”

1 Like

The alpha in a Dirichlet is a “concentration”. Maybe this helps:


julia> std(rand(Dirichlet(6,1),1000),dims=2)
6×1 Matrix{Float64}:
 0.14305158706755236
 0.14043323912305447
 0.12959278663394877
 0.14360349478154502
 0.14403746553339422
 0.13586448356346292

julia> std(rand(Dirichlet(6,8),1000),dims=2)
6×1 Matrix{Float64}:
 0.05263278171058433
 0.0531789315822342
 0.0524370633051767
 0.05355511158379274
 0.05450811672204576
 0.05293590862720139

A concentration of alpha=1 gives a uniform distribution over the simplex. Also, it can help to remember that the Dirichlet distribution generalizes a Beta distribution, where alpha and beta play the same role.

3 Likes

Oh, I thought that was a term that I didn’t understand.

I messed around with the Dirichlet distribution at The distribution zoo, and it looks like when the vector of parameters is all ones, like Dirichlet(ones(3)) or Dirichlet(3, 1), we get a sort of “uniform”, flat distribution.

It seems to behave similarly to the Beta distribution: when both its parameters are high, its PDF starts looking more and more like normal with mean 0.5. When all n Dirichlet’s parameters are equal and increase, the PDF also concentrates more and more around the middle: 1/n.

So it looks to me that Dirichlet(ones(n)) has the flattest PDF and is thus the most “uniform” among Dirichlet distributions with other parameters

Yeah I was aiming for a uniform flat distribution. And also what a great demonstration by @cscherrer thanks!

1 Like

BTW, @Storopoli, these are really nice tutorials - way easier to understand than those in Turing’s documentation. So thank you!

Hey @ForceBru thanks! I was invited to give a talk to Stuttgart’s Julia Meetup Group (Bayesian Inference with Turing.jl | Meetup)

I will cover some stuff that I haven’t gone on the tutorials. It will be next Saturday and open to everybody. Will be in english.

Edit: I’ve found that Turing’s examples are more geared towards computer science/machine learning folks. Not so much as someone who had classical stats training and then learned all over again with Bayesian statistics.

1 Like

Cool, just signed up for the talk!