I’ve just made some Bayesian Statistics tutorials using Julia and Turing. The content is fully opensourced in GitHub and has a very permissive Creative Commons License (which means you can copy and alter the crap out it…).

My hope is to have more people into Bayesian statistics. The content is aimed towards social scientists and PhD candidates in social sciences. I chose to provide an intuitive approach rather than focusing on rigorous mathematical formulations. I’ve made it to be how I would have liked to be introduced to Bayesian statistics.

Brilliant. Thanks for this. Turing is ideal for learning bayesian stats and probabilistic programming. I love that (much like Julia) it has a high level, easy syntax but you can dig deeper to access greater complexity and performance when needed.

I looks very interesting, I’ll read it when I have some time,
Thank you.

I think you missed the “beta distribution”, very common in Bayesian Statistics.

And I have a quick question…
Most frequentist statistics only use the mean and the variance of the sample.
What about bayesian statistics models? Do they use just the mean and the variance, or do they also make use of all the information of all the measures, i.e. all the moments?

@model dice_throw(y) = begin
#Our prior belief about the probability of each result in a six-sided dice.
#p is a vector of length 6 each with probability p that sums up to 1.
p ~ Dirichlet(6, 1)
#Each outcome of the six-sided dice has a probability p.
for i in eachindex(y)
y[i] ~ Categorical(p)
end
end;

And then it says:

Instead, I’ve opted for a Dirichlet with a weekly informative prior towards a “fair” dice which is encoded as a Dirichlet(6,1).

What’s a “weekly informative prior towards a “fair” dice”?

Why do we choose Dirichlet(6, 1) and not, say, Dirichlet(6, 8)? As far as I can tell, all of them have mean 0.166667:

all(
all(mean(Dirichlet(6, a)) .≈ 1/6)
for a ∈ 0.01:.001:30
) == true

This is going to be philosophical question? I don’t know, i think a 6-length vector of 1/6?

Good question. I honestly don’t know. Looking the help:

help?> Dirichlet
search: Dirichlet DirichletMultinomial
Dirichlet
The Dirichlet distribution (http://en.wikipedia.org/wiki/Dirichlet_distribution) is often used as the conjugate prior for Categorical
or Multinomial distributions. The probability density function of a Dirichlet distribution with parameter \alpha = (\alpha_1, \ldots,
\alpha_k) is:
f(x; \alpha) = \frac{1}{B(\alpha)} \prod_{i=1}^k x_i^{\alpha_i - 1}, \quad \text{ with }
B(\alpha) = \frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma \left( \sum_{i=1}^k \alpha_i \right)},
\quad x_1 + \cdots + x_k = 1
# Let alpha be a vector
Dirichlet(alpha) # Dirichlet distribution with parameter vector alpha
# Let a be a positive scalar
Dirichlet(k, a) # Dirichlet distribution with parameter a * ones(k)

and then:

julia> mean(Dirichlet(6,1))
6-element Fill{Float64}: entries equal to 0.16666666666666666
julia> mean(Dirichlet(6,8))
6-element Fill{Float64}: entries equal to 0.16666666666666666

A concentration of alpha=1 gives a uniform distribution over the simplex. Also, it can help to remember that the Dirichlet distribution generalizes a Beta distribution, where alpha and beta play the same role.

Oh, I thought that was a term that I didn’t understand.

I messed around with the Dirichlet distribution at The distribution zoo, and it looks like when the vector of parameters is all ones, like Dirichlet(ones(3)) or Dirichlet(3, 1), we get a sort of “uniform”, flat distribution.

It seems to behave similarly to the Beta distribution: when both its parameters are high, its PDF starts looking more and more like normal with mean 0.5. When all n Dirichlet’s parameters are equal and increase, the PDF also concentrates more and more around the middle: 1/n.

So it looks to me that Dirichlet(ones(n)) has the flattest PDF and is thus the most “uniform” among Dirichlet distributions with other parameters

I will cover some stuff that I haven’t gone on the tutorials. It will be next Saturday and open to everybody. Will be in english.

Edit: I’ve found that Turing’s examples are more geared towards computer science/machine learning folks. Not so much as someone who had classical stats training and then learned all over again with Bayesian statistics.