When using the formula, the intercept is automatically added. When passing a design matrix, it’s taken as given so you’d have to add a column on ones to match the formula behavior
julia> GLM.lm(@formula(y ~ x1 + x2 + x3), data_f)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
y ~ 1 + x1 + x2 + x3
Coefficients:
──────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.472579 0.0959524 4.93 <1e-05 0.282115 0.663043
x1 -0.133307 0.106589 -1.25 0.2141 -0.344884 0.0782694
x2 0.086871 0.104626 0.83 0.4084 -0.120809 0.294551
x3 0.131423 0.101629 1.29 0.1991 -0.0703094 0.333155
──────────────────────────────────────────────────────────────────────────
julia> GLM.lm([ones(size(new_y, 1)) new_x], new_y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:
Coefficients:
─────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────
x1 0.472579 0.0959524 4.93 <1e-05 0.282115 0.663043
x2 -0.133307 0.106589 -1.25 0.2141 -0.344884 0.0782694
x3 0.086871 0.104626 0.83 0.4084 -0.120809 0.294551
x4 0.131423 0.101629 1.29 0.1991 -0.0703094 0.333155
─────────────────────────────────────────────────────────────────