Best practices for weak and uninformative priors in Turing

I am working through a question for a class taught in R and I am trying to recreate the class examples in Julia. Now that we have started doing bayesian regression I have hit a bit of a learning curve, because I am doing things using the Turing package, while the class is using rstanarm. Here is the example I am currently stuck on:

prior data from the delta wave let’s us know that the rate of hospitalization is ~N(0.45,0.125). We are given data for the omicron wave and are asked to do a bayesian regression on the new data. The relevant R-code looks as follows

f1 <- stan_glm(CaseCount~date, data=omwave, prior=normal(location=0.4, scale=0.1))

According to R documentation this means we are only setting a prior for the regression coefficients, and we are leaving prior_intercept and prior_aux blank. After reading through the Turing tutorials this was my best attempt at recreating this simple model:

# Bayesian Linear Regression
@model function linear_regression(x,y)
    #setting an uniformative variance prior
    ϵ ~ Flat()

    #set uninformative intercept prior.
    a ~ Flat()
    
    #set the prior for our coefficient
    b ~ Normal(0.455,0.125)

    #setting up y∼N(a+xᵢ,σ)
    mu = a .+ b*x
    y~MvNormal(mu,ϵ)

end

#Now I sample 
model_o = linear_regression(omicron_wave.Days,log.(omicron_wave.CaseCount))
chain   = sample(model_o,NUTS(0.65),3_000);

The results that I get are WAAAY less stable than the R-version and all over the place on different runs. I think it’s because I am not using the Flat() distribution correctly? I thought if you ran a Bayesian regression with uninformative priors you would recover the results of least square, but when I try and set all of the parameters here to flat I get really REALLY wonky results.

I think I am failing to understand how to represent a part of your model which you have uninformative priors for. Any help would be appreciated!

Hi, a few notes. First, rstanarm (and brms, bambi, etc) do a lot of convenient transformations/rescaling automatically to improve sampling efficiency that you will need to do yourself if you write the model in Turing (or Stan, etc).

I highly recommend reading Prior Distributions for rstanarm Models • rstanarm. Importantly, none of these parameters have flat default priors. They give the default (unscaled) prior for ϵ (prior_aux) as Exponential(1) and the default (unscaled, after centering) prior for a as Normal(20, 2.5). Also, rstanarm internally centers the predictors (subtracts their mean). I’m not certain if it also divides them by their standard deviation, but it at least uses the standard deviation to rescale the priors.

In case you’re not aware, there’s also TuringGLM, which aims to be the brms of Turing.

1 Like

@sethaxen beat me. I was typing an answer. But it is pretty much this. rstanarm centers and scales the priors to mean 0 and std 1.

Both brms and rstanarm shifts either the data (in the case of the brms, and reconstructs the original scale back after sampling) or the priors (in the case of rstanarm that does not touch data and only centers and scales priors with autoscale = TRUE on by default on default priors).

If you reproduce the scaling and centering on the priors, you should have similar sampling performance and estimates.

EDIT: you can see original Stan code in brms with make_stancode and make_standata. I consider those “entry drugs” for probabilistic programming :slight_smile:

2 Likes

Oh wow these are great responses. I will definitely checkout TuringGLM. I think it would be a good exercise to try and do this myself with the base package, so thank you for the detailed responses! This gives me a lot to try and play around with but I have one remaining question.

On paper, if I replace the prior distributions with a flat distribution (and pretend it has constant value and integrates to one somehow) I recover the formula for maximum likelihood estimates (or something at least proportional). So for a Regression with Gaussian errors, I thought that setting all the prior destributions to something like Flat() would just reproduce a Lsq regression result? I think you see this with stan_glm if you set all the priors to NULL. I’ll check using the resources you gave me, but isn’t NULL similar to Flat()?

Thanks again for the top notch answers!

That suggested reading is exactly what I was looking for, thank you!

UPDATE: I applied the same scaling used by rstanarm and things improved dramatically Thank you :smiley: !

2 Likes