# Getting model parameter names from DataFrame

Suppose I have

``````df = DataFrame(
age=[20, 30, 40],
height=[80, 130, 200],
)
weight = [100, 120, 200]
``````

I want to predict `weight` from the other columns using linear regression. There are two options. Explicitly write each variable into the model, or build a covariate matrix X and write it into the model.

The linear regression tutorial uses a covariate matrix `x`:

``````# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
Οβ ~ truncated(Normal(0, 100), 0, Inf)

# Set intercept prior.
intercept ~ Normal(0, sqrt(3))

# Set the priors on our coefficients.
nfeatures = size(x, 2)
coefficients ~ MvNormal(nfeatures, sqrt(10))

# Calculate all the mu terms.
mu = intercept .+ x * coefficients
y ~ MvNormal(mu, sqrt(Οβ))
end
``````

It is easy to just use `Array(df)` as my covariate matrix, but that means all my coefficients have opaque names like ` coefficients[2]` in the output.

``````Summary Statistics
parameters     mean     std  naive_se    mcse       ess   r_hat
ββββββββββββββββ  βββββββ  ββββββ  ββββββββ  ββββββ  ββββββββ  ββββββ
coefficients[1]  -0.0413  0.5648    0.0126  0.0389  265.1907  1.0010
coefficients[2]   0.2770  0.6994    0.0156  0.0401  375.2777  1.0067
intercept   0.0058  0.1179    0.0026  0.0044  580.0222  0.9995
Οβ   0.3017  0.1955    0.0044  0.0132  227.2322  1.0005
``````

Is it possible to use the `names()` from the dataframe to create the coefficient names? This could happen (1) during model building or (2) after the trace is constructed. I think the idea of doing it during model building is most flexible, so every part of the analysis will automatically include the names.

The `StatsModels` package provides a formula language to convert from a symbolic description of a regression-like model to the model matrix.

β¦and @cpfiffer at some point had mocked up a brms-style integration with Turing, I think it was here: https://github.com/cpfiffer/BayesModels

If you want to use StatsModels directly, then you could do something like

``````using StatsModels

f = @formula(weight ~ 1 + age + height)
f_concrete = apply_schema(f, schema(df))

y, x = modelcols(f, df)
# ... turing magic

respname, prednames = coefnames(f)
``````

Note that this way, the intercept would be included in `x` so youβd have to modify your model, OR force no intercept by using `@formula(weight ~ 0 + age + height)`.

1 Like

One of these days I may have to turn that from a mockup into a real package.

1 Like

itβd make a great GSOC project actuallyβ¦pretty clear scope and just need someone to do itβ¦

Oh, very true! Iβll keep that in mind when weβre adding projects for Turing.