What is the difference between the two methods of lm on GLM.jl

When using the formula, the intercept is automatically added. When passing a design matrix, it’s taken as given so you’d have to add a column on ones to match the formula behavior

julia> GLM.lm(@formula(y ~ x1 + x2 + x3), data_f)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2 + x3

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x1           -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x2            0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x3            0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
──────────────────────────────────────────────────────────────────────────

julia> GLM.lm([ones(size(new_y, 1)) new_x], new_y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x2  -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x3   0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x4   0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
─────────────────────────────────────────────────────────────────
4 Likes