What is the difference between the two methods of lm on GLM.jl

The application of each one of the methods produced a different result:

using DataFrames
using GLM

new_x = rand(100,3)
new_y = rand(100)

GLM.lm(new_x, new_y)

Produce:

LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
───────────────────────────────────────────────────────────────
        Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────
x1  0.295717    0.096847   3.05    0.0029   0.103502   0.487931
x2  0.462823    0.0882726  5.24    <1e-06   0.287627   0.63802
x3  0.0857036   0.0956069  0.90    0.3722  -0.10405    0.275457
───────────────────────────────────────────────────────────────
data_f = DataFrame(x1 = new_x[:,1], x2 = new_x[:,2], x3 = new_x[:,3], y = new_y)

ols = GLM.lm(@formula(y ~ x1 + x2), data_f)

Produce:

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2

Coefficients:
──────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error     t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)  0.374513     0.0745926  5.02    <1e-05   0.226467    0.522558
x1           0.00540252   0.100866   0.05    0.9574  -0.194789    0.205594
x2           0.170571     0.0973294  1.75    0.0828  -0.0226013   0.363743
──────────────────────────────────────────────────────────────────────────

Why?

Thanks in advance for your help.

When using the formula, the intercept is automatically added. When passing a design matrix, it’s taken as given so you’d have to add a column on ones to match the formula behavior

julia> GLM.lm(@formula(y ~ x1 + x2 + x3), data_f)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2 + x3

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x1           -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x2            0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x3            0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
──────────────────────────────────────────────────────────────────────────

julia> GLM.lm([ones(size(new_y, 1)) new_x], new_y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x2  -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x3   0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x4   0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
─────────────────────────────────────────────────────────────────
4 Likes

Thanks for your help again, Andreas!!