Why assign high variance for prior in bayesian linear regression?

I am practicing and implementing Bayesian Linear Regression from Linear Regression . Here, for the linear regression model, they assign a prior variance truncated(Normal(0, 100), 0, Inf) can someone explain, why they have assigned such high variance, I am also guessing this is the distribution for the epsilon, which is basically the error?

# Bayesian linear regression.
@model function linear_regression(x, y)
    # Set variance prior.
    σ₂ ~ truncated(Normal(0, 100), 0, Inf)

    # Set intercept prior.
    intercept ~ Normal(0, sqrt(3))

    # Set the priors on our coefficients.
    nfeatures = size(x, 2)
    coefficients ~ MvNormal(nfeatures, sqrt(10))

    # Calculate all the mu terms.
    mu = intercept .+ x * coefficients
    y ~ MvNormal(mu, sqrt(σ₂))
end

Just to be clear: your question appears to be about a fundamental conceot of Bayesian inference, not the implementation of any techniques in Julia.

I’d suggest reading about the reasons why one uses an informative or non-imformative priors. Basically the higher the variance of the your priors, the less assumptions you are making about your model before considering your observed data.

There are several theoretical and practical concerns to consider when choosing your priors, and it’s important to have some understanding of those before using Bayesian techniques.

4 Likes

yes, the notation y ~normal(μ, σ) is identical to y = μ + ε with ε ~ normal(0, σ).
Keep in mind that a variance on the scale of about 100 only corresponds to a standard deviation of about 10, though I agree that this prior is a bit wide for today’s standards (assuming x and y being on a roughly standard normal scale).

Priors like this were popular about 10 years ago, the dangers of these vague priors have only been appreciated relatively recently, see [1708.07487] The prior can generally only be understood in the context of the likelihood for a related paper.
The idea was often to “let the data speak for itself”, and “be conservative” (in the sense of letting the prior not influence the posterior too much), but now we know that wide priors can sometimes influence the model in unexpected ways.
A common example that high variance does not necessarily mean less assumptions is logistic regression, where a model like y \sim \text{inv_logit}(\theta) with \theta = \beta_0 + \beta x can allocate a lot of probability mass on values of \theta near 0 and 1 if the prior for \beta is wide.
Whether this is desireable or not depends on the context of course, but usually the goal that the modeler had in mind was not to encode strong prior assumptions into the model.

Though for a model like this, the justification for choosing priors like \sigma ~ \text{normal}(0, 100) is often simply because it does not matter. The error scale is usually well identified for linear models with normal likelihoods, so it really does not make a difference if you’re using 10, 100 or 1e6 as the standard deviation.

3 Likes