I’ve just made some Bayesian Statistics tutorials using Julia and Turing. The content is fully opensourced in GitHub and has a very permissive Creative Commons License (which means you can copy and alter the crap out it…).
My hope is to have more people into Bayesian statistics. The content is aimed towards social scientists and PhD candidates in social sciences. I chose to provide an intuitive approach rather than focusing on rigorous mathematical formulations. I’ve made it to be how I would have liked to be introduced to Bayesian statistics.
Brilliant. Thanks for this. Turing is ideal for learning bayesian stats and probabilistic programming. I love that (much like Julia) it has a high level, easy syntax but you can dig deeper to access greater complexity and performance when needed.
I looks very interesting, I’ll read it when I have some time,
Thank you.
I think you missed the “beta distribution”, very common in Bayesian Statistics.
And I have a quick question…
Most frequentist statistics only use the mean and the variance of the sample.
What about bayesian statistics models? Do they use just the mean and the variance, or do they also make use of all the information of all the measures, i.e. all the moments?
@model dice_throw(y) = begin
#Our prior belief about the probability of each result in a six-sided dice.
#p is a vector of length 6 each with probability p that sums up to 1.
p ~ Dirichlet(6, 1)
#Each outcome of the six-sided dice has a probability p.
for i in eachindex(y)
y[i] ~ Categorical(p)
end
end;
And then it says:
Instead, I’ve opted for a Dirichlet with a weekly informative prior towards a “fair” dice which is encoded as a Dirichlet(6,1).
What’s a “weekly informative prior towards a “fair” dice”?
Why do we choose Dirichlet(6, 1) and not, say, Dirichlet(6, 8)? As far as I can tell, all of them have mean 0.166667:
all(
all(mean(Dirichlet(6, a)) .≈ 1/6)
for a ∈ 0.01:.001:30
) == true
This is going to be philosophical question? I don’t know, i think a 6-length vector of 1/6?
Good question. I honestly don’t know. Looking the help:
help?> Dirichlet
search: Dirichlet DirichletMultinomial
Dirichlet
The Dirichlet distribution (http://en.wikipedia.org/wiki/Dirichlet_distribution) is often used as the conjugate prior for Categorical
or Multinomial distributions. The probability density function of a Dirichlet distribution with parameter \alpha = (\alpha_1, \ldots,
\alpha_k) is:
f(x; \alpha) = \frac{1}{B(\alpha)} \prod_{i=1}^k x_i^{\alpha_i - 1}, \quad \text{ with }
B(\alpha) = \frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma \left( \sum_{i=1}^k \alpha_i \right)},
\quad x_1 + \cdots + x_k = 1
# Let alpha be a vector
Dirichlet(alpha) # Dirichlet distribution with parameter vector alpha
# Let a be a positive scalar
Dirichlet(k, a) # Dirichlet distribution with parameter a * ones(k)
and then:
julia> mean(Dirichlet(6,1))
6-element Fill{Float64}: entries equal to 0.16666666666666666
julia> mean(Dirichlet(6,8))
6-element Fill{Float64}: entries equal to 0.16666666666666666
A concentration of alpha=1 gives a uniform distribution over the simplex. Also, it can help to remember that the Dirichlet distribution generalizes a Beta distribution, where alpha and beta play the same role.
Oh, I thought that was a term that I didn’t understand.
I messed around with the Dirichlet distribution at https://ben18785.shinyapps.io/distribution-zoo/, and it looks like when the vector of parameters is all ones, like Dirichlet(ones(3)) or Dirichlet(3, 1), we get a sort of “uniform”, flat distribution.
It seems to behave similarly to the Beta distribution: when both its parameters are high, its PDF starts looking more and more like normal with mean 0.5. When all n Dirichlet’s parameters are equal and increase, the PDF also concentrates more and more around the middle: 1/n.
So it looks to me that Dirichlet(ones(n)) has the flattest PDF and is thus the most “uniform” among Dirichlet distributions with other parameters
I will cover some stuff that I haven’t gone on the tutorials. It will be next Saturday and open to everybody. Will be in english.
Edit: I’ve found that Turing’s examples are more geared towards computer science/machine learning folks. Not so much as someone who had classical stats training and then learned all over again with Bayesian statistics.
Hi @Storopoli I a new user of Turing and Julia for Bayesian modeling. I was trying to replicate a basic Linear Regression model in your GitHub link (Children’s IQ Score example). I could replicate the model but could not plot the posterior results / trace using StatsPlots.
When I type using StatsPlots and try to run in the Jupyter Notebook it gives errors stating that one of StatsPlots dependencies Arpack has not been installed correctly. See this link (I posted this under New To Julia) Arpack error with StatsPlots
Any pointers in this regard? I’m coming from R and Python based Bayesian Models using brms and rstan as well as pymc. Are there any alternatives to StatsPlots that would allow me to plot the Bayesian model results?
I’m on Ubuntu 21.04 using Julia 1.7.3. After posting my question on alternative graphing libraries, I found online reference to using Turing and ArviZ together. For the time being, I’m using ArviZ.
I have tried to install StatsPlots and dependencies multiple times by following different instructions on Github and Discourse based on other Linux users experiences, but so far no success. I have also opened an issue at github/StatsPlots.
I will check the reference on using Makie.jl - thanks again for your help.
@Storopoli I have another question for you: Does Turing have a front-end equivalent such as brms for rstan in R or such as bambi for pymc in Python? Both brms and bambi permit a simple formula based approach to specifying advanced Bayesian multi-level models for example.
The brms model specification is very much like that of lme4 in R and bambi, inspired by brms follows a specification very close to StatsModels. in Python. For myself and a lot of others, both of these interfaces (brms and bambi) made Bayesian Modeling more accessible and simpler.
Would love to see something like that for Turing as well.