Using all independent variables with @formula in a multiple linear model

Hi, I’m trying to learn how to do linear models and wanted to know if there something like in R, where you can use a dot (".") instead of writing all the indepent variables in a linear regression model, for example in R:

LinearRegressor = lm(Y ~ ., data = dataset)

in Julia, I’m using GLM like this

fm = @formula(Y ~ X1 + X2 + … + Xn)
LinearRegressor = lm(fm, df)

and I had no problems until the moment, but now I need to do a LM with 28 independent variables and wanted to know if theres is a simple way to do it.

Thanks.

1 Like

The dot notation does not exist at the moment in StatsModels, so the best bet is to programmatically build up the formula using the Term constructor directly:

lm(Term(:Y) ~ sum(Term.(Symbol.(names(df[:, Not(:Y)])))), df)
6 Likes

Edit2: Ok, is working now but I need to use your solution in the in the lm() directly, instead of being able to create a @formula variable. would be nice to be able to do a @formula variable, specially for cross validation and other that need to work with that. But for the time being, thank you again.

Thanks, it was give me an error:

LoadError: ArgumentError: non-call expression encountered: Term.(Symbol.(names(df[:, Not(:Y])))

but I solve it creating a variable with the sum of the independent variables like this

IndependVars = sum(Term.(Symbol.(names(df[:, Not(:Y)]))))

and using it as

fm = @formula(Y ~ IndependVars)

edit: my solution is not working, the lm thinks that IndependVars is a column in my df…

You are on the right track, but you don’t need the @formula call when constructing names programmatically.

julia> df = DataFrame(y = rand(100), x1 = rand(100), x2 = rand(100));

julia> lm(term(:y) ~ sum(term.(Symbol.(names(df, Not(:y))))), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + x1 + x2

Coefficients:
───────────────────────────────────────────────────────────────────────────────
               Estimate  Std. Error    t value  Pr(>|t|)  Lower 95%   Upper 95%
───────────────────────────────────────────────────────────────────────────────
(Intercept)   0.64731     0.0719799   8.99292     <1e-13   0.504449   0.79017
x1            0.0392958   0.0932668   0.421327    0.6744  -0.145813   0.224405
x2           -0.259494    0.100676   -2.57752     0.0115  -0.459308  -0.0596804
───────────────────────────────────────────────────────────────────────────────

This is definitely not easy syntax. But overall the system is a bit more flexible than R, where using the rest of the names is easy but using a specific list of names for independent variables is very hard.

1 Like

To slightly expand on Peter’s answer, what you tried originally was essentially constructing the Terms twice, as the @formula macro converts its arguments to Terms already. It being a macro, it ultimately has to generate some code that you could have written yourself by hand. To see this:

julia> using MacroTools

julia> @macroexpand @formula(Y ~ X1 + X2)
:(StatsModels.Term(:Y) ~ StatsModels.Term(:X1) + StatsModels.Term(:X2))

julia> @formula(Y ~ X1 + X2) == (Term(:Y) ~ Term(:X1) + Term(:X2))
true
3 Likes