Using all independent variables with @formula in a multiple linear model

Hi, I’m trying to learn how to do linear models and wanted to know if there something like in R, where you can use a dot (".") instead of writing all the indepent variables in a linear regression model, for example in R:

LinearRegressor = lm(Y ~ ., data = dataset)

in Julia, I’m using GLM like this

fm = @formula(Y ~ X1 + X2 + … + Xn)
LinearRegressor = lm(fm, df)

and I had no problems until the moment, but now I need to do a LM with 28 independent variables and wanted to know if theres is a simple way to do it.

Thanks.

The dot notation does not exist at the moment in StatsModels, so the best bet is to programmatically build up the formula using the Term constructor directly:

lm(Term(:Y) ~ sum(Term.(Symbol.(names(df[:, Not(:Y)])))), df)
4 Likes

Edit2: Ok, is working now but I need to use your solution in the in the lm() directly, instead of being able to create a @formula variable. would be nice to be able to do a @formula variable, specially for cross validation and other that need to work with that. But for the time being, thank you again.

Thanks, it was give me an error:

LoadError: ArgumentError: non-call expression encountered: Term.(Symbol.(names(df[:, Not(:Y])))

but I solve it creating a variable with the sum of the independent variables like this

IndependVars = sum(Term.(Symbol.(names(df[:, Not(:Y)]))))

and using it as

fm = @formula(Y ~ IndependVars)

edit: my solution is not working, the lm thinks that IndependVars is a column in my df…

You are on the right track, but you don’t need the @formula call when constructing names programmatically.

julia> df = DataFrame(y = rand(100), x1 = rand(100), x2 = rand(100));

julia> lm(term(:y) ~ sum(term.(Symbol.(names(df, Not(:y))))), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + x1 + x2

Coefficients:
───────────────────────────────────────────────────────────────────────────────
               Estimate  Std. Error    t value  Pr(>|t|)  Lower 95%   Upper 95%
───────────────────────────────────────────────────────────────────────────────
(Intercept)   0.64731     0.0719799   8.99292     <1e-13   0.504449   0.79017
x1            0.0392958   0.0932668   0.421327    0.6744  -0.145813   0.224405
x2           -0.259494    0.100676   -2.57752     0.0115  -0.459308  -0.0596804
───────────────────────────────────────────────────────────────────────────────

This is definitely not easy syntax. But overall the system is a bit more flexible than R, where using the rest of the names is easy but using a specific list of names for independent variables is very hard.

1 Like

To slightly expand on Peter’s answer, what you tried originally was essentially constructing the Terms twice, as the @formula macro converts its arguments to Terms already. It being a macro, it ultimately has to generate some code that you could have written yourself by hand. To see this:

julia> using MacroTools

julia> @macroexpand @formula(Y ~ X1 + X2)
:(StatsModels.Term(:Y) ~ StatsModels.Term(:X1) + StatsModels.Term(:X2))

julia> @formula(Y ~ X1 + X2) == (Term(:Y) ~ Term(:X1) + Term(:X2))
true
2 Likes