Learning Bayesian data analysis using Julia


#1

Hi everyone,

I’m trying to teach myself Bayesian data analysis (and Julia) by working through Statistical Rethinking by Richard McElreath. Has anyone, familiar with the book, tried working through the code examples using Julia? I know someone has worked through the code for this book and Doing Bayesian Data Analysis by John Kruschke using Python and PyMC3. Is it possible to do the same with Julia at this time? Is there anything equivalent to PyMC3 in Julia right now that could be used by someone who is just looking to perform data analysis?

Considering I’m also not very experienced with Julia, would it make more sense for me, at this time, to stick with R when working through the code examples? Statistical Rethinking uses a custom R library and I’ve had difficultly trying to find equivalencies for some of the functions (e.g. maximum a posteriori).


#2

I worked through the Kruschke book as an undergrad years ago and the vast majority of itwas with JAGS. If you replace that with Stan.jl it’s almost the same thing. If you want to be more adventurous, give things like Mamba.jl (or Turing.jl is really really awesome!) a try.


#3

I’d definitely be keen to hear your experiences with doing this.


#4

I would recommend using Stan over Jaggs because the NUTS algorithm will converge better and faster across a large range of models, especially hierarchical models. My college ported Krushke’s models to Stan here: DBDA2Estan. Stan.jl are Rstan are very similar. The main difference is which language you prefer to use outside of Stan.

Unfortunately Stan.jl does not have an analog to the function map2stan referenced in Statistical Rethinking. map2stan uses high level syntax (similar to lme4) to auto-generate Stan models. As such, it hides some of the details of the model, which might be undesirable for someone who wants to understand the nuts and bolts of the model and learn to customize models.


#6

Thanks, I hadn’t heard of Turing.jl before you mentioned it. Here is a block of code from chapter 4 of McElreath that I’ve been having difficulty with:

flist <- alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50
)

The matching code with PyMC3 looks like this:

with pm.Model() as m4_1:
mu = pm.Normal(‘mu’, mu=178, sd=20)
sigma = pm.Uniform(‘sigma’, lower=0, upper=50)
height = pm.Normal(‘height’, mu=mu, sd=sigma, observed=d2.height)

An example on the Turing.jl site looks very similar to these. It might be what i"m looking for. Unfortunately, it doesn’t seem to work with 0.6 at the moment.


#7

I certainly have had to work at a slow pace partly because I don’t know what packages are available (if any) that would allow me to replicate some of the R code like what I had mentioned above. I’m doing all of the work in notebooks so I would be willing to post them for everyone to look at. And I certainly would welcome any feedback.


#8

Are there any packages that will do maximum a posteriori? I had seen something mentioned in the documentation for Distributions.jl, but it’s only a mention.


#9

Stan would be the easiest. Eg, just use Optimize() instead of Sample(). Example:

Another option is of course Optim:
Optim.jl

I’m also developing a package that uses adaptive sparse grid quadrature to solve Bayesian models. Extremely fast (compared to MCMC) in a few tests, but I will have to try more with larger numbers of parameters. I don’t know yet where the cross over point is likely to be, but I’ve still fit Stan models with >30,000 parameters and imagine it is going to be far south of that.
One of the functions finds modes within the unconstrained parameter space; I could easily add the option to have it return proper MAPs by just leaving off the Jacobians. It uses Optim; the advantage is that it automates (like Stan) the transformations to an unconstrained space.
The advantages over Stan is that ForwardDiff.jl gives Hessians via auto-diff, allowing it to use NewtonTrustRegion, which I have found to converge most consistently.

Stan only has autodiff for gradients, but its optimizer is still great (and (L)BFGS is quite solid).

EDIT:
Literal translation:

height_model = "
data {
  vector height;
}
parameters {
  real mu;
  real<lower = 0, upper = 50> sigma;
}
model {
  height ~ normal(mu, sigma);
  mu ~ normal(178, 20);
)

I would probably move the prior mean and standard deviation to the data block (naming them something like mu_0 and sigma_0) so you could explore prior sensitivity without having to recompile.

I really like McElreath’s book, but would also recommend you learn Stan itself instead of just the “rethinking” package.
You can also extract the actual Stan code the rethinking package generates, which could help while still learning Stan’s syntax.
Stan’s documentation is also marvelous.