Suppose I have
df = DataFrame(
age=[20, 30, 40],
height=[80, 130, 200],
)
weight = [100, 120, 200]
I want to predict weight
from the other columns using linear regression. There are two options. Explicitly write each variable into the model, or build a covariate matrix X and write it into the model.
The linear regression tutorial uses a covariate matrix x
:
# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
Οβ ~ truncated(Normal(0, 100), 0, Inf)
# Set intercept prior.
intercept ~ Normal(0, sqrt(3))
# Set the priors on our coefficients.
nfeatures = size(x, 2)
coefficients ~ MvNormal(nfeatures, sqrt(10))
# Calculate all the mu terms.
mu = intercept .+ x * coefficients
y ~ MvNormal(mu, sqrt(Οβ))
end
It is easy to just use Array(df)
as my covariate matrix, but that means all my coefficients have opaque names like coefficients[2]
in the output.
Summary Statistics
parameters mean std naive_se mcse ess r_hat
ββββββββββββββββ βββββββ ββββββ ββββββββ ββββββ ββββββββ ββββββ
coefficients[1] -0.0413 0.5648 0.0126 0.0389 265.1907 1.0010
coefficients[2] 0.2770 0.6994 0.0156 0.0401 375.2777 1.0067
intercept 0.0058 0.1179 0.0026 0.0044 580.0222 0.9995
Οβ 0.3017 0.1955 0.0044 0.0132 227.2322 1.0005
Is it possible to use the names()
from the dataframe to create the coefficient names? This could happen (1) during model building or (2) after the trace is constructed. I think the idea of doing it during model building is most flexible, so every part of the analysis will automatically include the names.