How to get a GLM where formula is programmatically generated

mkarikom · July 22, 2020, 9:41pm

I have a DataFrame and need to build a model where the predictors follow some naming scheme.

For the example data below, suppose the scheme is “!=y”:

using DataFrames
using GLM

data = DataFrame(y=[22.1,20.1,7.1,9.1,1000,200],
                 x1=[1.1,2.1,3.1,4.1,10,100.2],
                 x2=[1,2,3,4.0,11.2,100.1])

Can anyone suggest a modification to Ex.2 below that would make the models in Ex.1 (ols1) and Ex.2 (ols2) equivalent?

Please note that while ols2 does not run, I’m looking for something of comparable terseness, if possible.

Ex 1:

ols1 = GLM.lm(@formula(y ~ x1 + x2), data)
y ~ 1 + x1 + x2

Coefficients:
────────────────────────────────────────────────────────────────────────────
              Estimate  Std. Error   t value  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)    84.4784     4.76541   17.7274    0.0004    69.3127     99.644
x1           -745.249      8.33275  -89.4361    <1e-5   -771.767    -718.73
x2            747.144      8.34614   89.5196    <1e-5    720.582     773.705

Ex.2

preds = Symbol.(names(data)[findall(names(data) .!= "y")])
2-element Array{String,1}:
 "x1"
 "x2"

ols2 = GLM.lm(@formula(y ~ preds), data)
ERROR: type NamedTuple has no field preds

CameronBieganek · July 22, 2020, 11:36pm

You can probably make use of Terms objects, as described here.

dave.f.kleinschmidt · August 16, 2020, 4:08pm

Cameron is exactly right. These two forms are exactly equivalent:

julia> using StatsModels

julia> (Term(:y) ~ Term(:x1) + Term(:x2)) == @formula(y ~ x1 + x2)
true

As @nilshg pointed out in this post, this is actually the expression that’s generated by the formula macro:

julia> @macroexpand @formula(y ~ x1 + x2)
:(StatsModels.Term(:y) ~ StatsModels.Term(:x1) + StatsModels.Term(:x2))

dave.f.kleinschmidt · September 1, 2020, 3:56pm

@nilshg just added support for creating a Term from a string (available in v0.6.14, which should automerge soon), so you can now do something like

terms = term.(names(data))
f = terms[1] ~ sum(terms[2:end]) # assuming first column is "y"
ols2 = lm(f, data)

Topic		Replies	Views
How to define a formula with a vector of string names for GLM.jl? General Usage	3	1255	August 14, 2020
Using GLM programmatically General Usage question , metaprogramming , glm	8	948	October 8, 2024
Extract @formula / model formula from GLM object General Usage glm	2	533	October 8, 2021
Using all independent variables with @formula in a multiple linear model New to Julia glm	18	4420	January 29, 2023
How to use all variable in @formula like R language? New to Julia	4	656	September 30, 2022

How to get a GLM where formula is programmatically generated

Related topics