Learning Bayesian data analysis using Julia


#1

Hi everyone,

I’m trying to teach myself Bayesian data analysis (and Julia) by working through Statistical Rethinking by Richard McElreath. Has anyone, familiar with the book, tried working through the code examples using Julia? I know someone has worked through the code for this book and Doing Bayesian Data Analysis by John Kruschke using Python and PyMC3. Is it possible to do the same with Julia at this time? Is there anything equivalent to PyMC3 in Julia right now that could be used by someone who is just looking to perform data analysis?

Considering I’m also not very experienced with Julia, would it make more sense for me, at this time, to stick with R when working through the code examples? Statistical Rethinking uses a custom R library and I’ve had difficultly trying to find equivalencies for some of the functions (e.g. maximum a posteriori).


#2

I worked through the Kruschke book as an undergrad years ago and the vast majority of itwas with JAGS. If you replace that with Stan.jl it’s almost the same thing. If you want to be more adventurous, give things like Mamba.jl (or Turing.jl is really really awesome!) a try.


#3

I’d definitely be keen to hear your experiences with doing this.


#4

I would recommend using Stan over Jaggs because the NUTS algorithm will converge better and faster across a large range of models, especially hierarchical models. My college ported Krushke’s models to Stan here: DBDA2Estan. Stan.jl are Rstan are very similar. The main difference is which language you prefer to use outside of Stan.

Unfortunately Stan.jl does not have an analog to the function map2stan referenced in Statistical Rethinking. map2stan uses high level syntax (similar to lme4) to auto-generate Stan models. As such, it hides some of the details of the model, which might be undesirable for someone who wants to understand the nuts and bolts of the model and learn to customize models.


#6

Thanks, I hadn’t heard of Turing.jl before you mentioned it. Here is a block of code from chapter 4 of McElreath that I’ve been having difficulty with:

flist <- alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50
)

The matching code with PyMC3 looks like this:

with pm.Model() as m4_1:
mu = pm.Normal(‘mu’, mu=178, sd=20)
sigma = pm.Uniform(‘sigma’, lower=0, upper=50)
height = pm.Normal(‘height’, mu=mu, sd=sigma, observed=d2.height)

An example on the Turing.jl site looks very similar to these. It might be what i"m looking for. Unfortunately, it doesn’t seem to work with 0.6 at the moment.


#7

I certainly have had to work at a slow pace partly because I don’t know what packages are available (if any) that would allow me to replicate some of the R code like what I had mentioned above. I’m doing all of the work in notebooks so I would be willing to post them for everyone to look at. And I certainly would welcome any feedback.


#8

Are there any packages that will do maximum a posteriori? I had seen something mentioned in the documentation for Distributions.jl, but it’s only a mention.


#9

Stan would be the easiest. Eg, just use Optimize() instead of Sample(). Example:

Another option is of course Optim:
Optim.jl

I’m also developing a package that uses adaptive sparse grid quadrature to solve Bayesian models. Extremely fast (compared to MCMC) in a few tests, but I will have to try more with larger numbers of parameters. I don’t know yet where the cross over point is likely to be, but I’ve still fit Stan models with >30,000 parameters and imagine it is going to be far south of that.
One of the functions finds modes within the unconstrained parameter space; I could easily add the option to have it return proper MAPs by just leaving off the Jacobians. It uses Optim; the advantage is that it automates (like Stan) the transformations to an unconstrained space.
The advantages over Stan is that ForwardDiff.jl gives Hessians via auto-diff, allowing it to use NewtonTrustRegion, which I have found to converge most consistently.

Stan only has autodiff for gradients, but its optimizer is still great (and (L)BFGS is quite solid).

EDIT:
Literal translation:

height_model = "
data {
  vector height;
}
parameters {
  real mu;
  real<lower = 0, upper = 50> sigma;
}
model {
  height ~ normal(mu, sigma);
  mu ~ normal(178, 20);
)

I would probably move the prior mean and standard deviation to the data block (naming them something like mu_0 and sigma_0) so you could explore prior sensitivity without having to recompile.

I really like McElreath’s book, but would also recommend you learn Stan itself instead of just the “rethinking” package.
You can also extract the actual Stan code the rethinking package generates, which could help while still learning Stan’s syntax.
Stan’s documentation is also marvelous.


#10

How does Stan compare with other Julia packages such as DynamicHMC.jl, Lora.jl, Klara.jl or BIAS.jl?
Which one converges faster?
Which one is simpler to use?
With Stan it’s difficult to deal with discrete parameters, how difficult is it with other Julia packages?
Do we have any Julia package like rstanarm or brms to automatically fit a regression model defined by a simple equation using MCMC?


#11

I’d go check Turing.jl


#12

@Juan just as a disclaimer, I only have a cursory knowledge of the packages and MCMC in general. My general suspicion is that, as an older project, Stan is further developed. In the long run, however, the primary advantage of a package that is written in Julia is that it is easier to modify, understand, and use with other Julia packages. This is the first time I came across Turing.jl. Here are a few of my initial thoughts for what it’s worth: It looks promising over all. I like the ability to specify custom distributions as well as the Stan like syntax (e.g. θ ~ Beta(5,5)). It would be nice if it was possible to vectorize the ~ operation with something like x .~ Normal(mu,sigma). Based on the documentation, its not clear to me whether running parallel chains is currently possible. If not, it would be great to have something like: HMC(iterations, ϵ, τ,Nchains), and to perhaps support keyword arguments so that relying on knowledge of positioning is not necessary. I also noticed it does not appear to be possible to index Turing.Chain objects by parameter name, which is possible with MCMCChain.jl. This allows one to create an array of plots by parameter to prevent overcrowding like so: plots = map(x->plot(chain[:,x,:]),chain.names). Perhaps these features are already there unbeknownst to me or they could be added.

After looking at the Stan repo, there does not appear to be any plans to create an interface like BRMS.


#13

I am the author of DynamicHMC.jl, which underwent and API redesign recently.

  1. Convergence depends on the algorithm, mostly. DynamicHMC.jl uses mostly the same algorithm as Stan. That said, the O(1) cost of evaluating the likelihood is important. I wrote DynamicHMC.jl to be able to optimize that within Julia (using profiling, optimizations, unit tested, etc).
  2. Turing.jl and Lora.jl would be easiest to you if you don’t like coding likelihoods directly. If you do, DynamicHMC.jl is probably the most suitable.
  3. The HMC algorithm works in continuous parameter spaces, ideally the whole unconstrained \mathbb{R}^n (which is why you use transformations). You have to integrate out discrete ones, as described in the Stan manual.

Finally, the subjective part: for learning Bayesian analysis, I think it is beneficial to code your likelihoods, instead of using a DSL. You gain a deeper understanding that way. Once you know what you are doing, the extra cost of doing it is tiny.

Stan and similar probabilistic DSLs inevitably run into difficulties when it comes to transformations. I don’t know how Turing.jl does it, but in Stan you have to increment the log likelihood “manually”, which defeats the whole purpose. I am currently experimenting with an approach to make programming these easier, but that is WIP.


#14

I agree with Tamas. Julia is a nice language to set up Bayesian analysis. With Distributions and maybe ConjugatePriors it becomes straightforward to implement Markov chains. For example a Gibbs sampler sampling the posterior parameters of normal observations with mean distributed a priori as N(μ0, σ0) and precision 1/σ^2 a priori as Gamma(α, β):

using Distributions

function mc(x, iterations, μ0 = 0., τ0 = 1e-7, α = 0.0001, β = 0.0001)
    n = length(x)
    sumx = sum(x)
    μ, τ = sumx/n, 1/var(x)
    μs, τs = Float64[], Float64[]
    for i in 1:iterations
        μ = rand(Normal((τ0*μ0 + τ*sumx)/(τ0 + n*τ), 1/sqrt(τ0 + n*τ)))
        τ = rand(Gamma(α + n/2, 1/(β + 0.5*sum((xᵢ-μ)^2 for xᵢ in x))))
        push!(μs, μ)
        push!(τs, τ)
    end
    μs, τs
end

μs, τs = mc(rand(Normal(0.2, 1.7^(-0.5)), 1_000), 10_000)
println("mean μ = ", mean(μs), " ± ", std(μs)) 
println("precision τ = ", mean(τs), " ± ", std(τs)) 

#15

Hi,
Turing currently internally recognises if a tilde call should be broadcasted. Very much like Stan does. We currently discuss ideas on making the syntax more explicit and closer to the Julia standard.

Regarding the Chain type. This will at some point be completely replaced by MCMCChain.Chains. It’s not quite possible at the moment but we are working on it.

Besides these points we are working on making the code base cleaner and more accessible so that it is easy for everyone to extend Turing. Further, many members of the team are active ML researchers that work on MCMC algorithms and/or BNP. And we will push Turing further towards state-of-the-art after the code is cleaner and the performance has been improved.


#16

Turing uses the Bijector package from TuringLang to handle transformations.


#17

How does it compare to other packages such as Klara, dynamicHMC, Lora, Mamba…


#18

Awesome! Thanks for the information. I look forward to seeing the new features. Just out of curiosity, can you tell me whether it is or will be possible to run multiple chains in parallel? I didn’t see anything in the documentation.


#19

Turing is a universal probabilistic programming language (PPL). In contrast to other packages you can implement your model by specifying the generative process. Further, Turing provides stochastic control flows and provides inference algorithms for discrete and continuous RVs. And in contrast to several other PPLs, Turing allows you to compose inference algorithms.

Please have a look at the documentation and the new tutorials if want to learn more about Turing.


#20

Currently you have to write your own distributed for loop around the sample function for this. Feel free to open an issue on GitHub if you run into problems or have questions.


#21

Any plans for deep bayesian models with flux.jl and variational inference like ADVI, in the vein of Pyro?