What is the difference between the two methods of lm on GLM.jl

dpabon · February 24, 2023, 2:30pm

The application of each one of the methods produced a different result:

using DataFrames
using GLM

new_x = rand(100,3)
new_y = rand(100)

GLM.lm(new_x, new_y)

Produce:

LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
───────────────────────────────────────────────────────────────
        Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────
x1  0.295717    0.096847   3.05    0.0029   0.103502   0.487931
x2  0.462823    0.0882726  5.24    <1e-06   0.287627   0.63802
x3  0.0857036   0.0956069  0.90    0.3722  -0.10405    0.275457
───────────────────────────────────────────────────────────────

data_f = DataFrame(x1 = new_x[:,1], x2 = new_x[:,2], x3 = new_x[:,3], y = new_y)

ols = GLM.lm(@formula(y ~ x1 + x2), data_f)

Produce:

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2

Coefficients:
──────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error     t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)  0.374513     0.0745926  5.02    <1e-05   0.226467    0.522558
x1           0.00540252   0.100866   0.05    0.9574  -0.194789    0.205594
x2           0.170571     0.0973294  1.75    0.0828  -0.0226013   0.363743
──────────────────────────────────────────────────────────────────────────

Why?

Thanks in advance for your help.

andreasnoack · February 24, 2023, 2:43pm

When using the formula, the intercept is automatically added. When passing a design matrix, it’s taken as given so you’d have to add a column on ones to match the formula behavior

julia> GLM.lm(@formula(y ~ x1 + x2 + x3), data_f)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2 + x3

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x1           -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x2            0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x3            0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
──────────────────────────────────────────────────────────────────────────

julia> GLM.lm([ones(size(new_y, 1)) new_x], new_y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1   0.472579   0.0959524   4.93    <1e-05   0.282115   0.663043
x2  -0.133307   0.106589   -1.25    0.2141  -0.344884   0.0782694
x3   0.086871   0.104626    0.83    0.4084  -0.120809   0.294551
x4   0.131423   0.101629    1.29    0.1991  -0.0703094  0.333155
─────────────────────────────────────────────────────────────────

dpabon · February 25, 2023, 3:00pm

Thanks for your help again, Andreas!!

Topic		Replies	Views
The simplest linear fit with GLM Tooling glm	13	5278	November 11, 2021
Linear regression without the intercept term Statistics question , regression , fit , glm	7	4146	March 8, 2023
Discrepancy between lme4 and GLM.jl Machine Learning statistics , linear-regression	7	1082	November 1, 2022
GLM - inconsistency with R on ISLR dataset? Performance question , glm	2	555	January 30, 2022
Very basic GLM help New to Julia regression , fit , glm	6	3575	November 9, 2020

What is the difference between the two methods of lm on GLM.jl

Related topics