Linear regression without the intercept term

In GLM.jl, the use of DataFrame is preferred, but the lm function does support the use of vectors and matrices. In the latter case, however, I can’t do fit without the intercept term (i.e. b0 = 0). Is there a way to do this without using DataFrame?

When I saw the source codes, the X argument in the lm function should be AbstractMatrix, not AbstractVector.

Certainly, when using DataFrame, there is no problem at all; this question is just my curiosity :slight_smile:

julia> using GLM; x = [1,2,3]; y = [2,5,7];

julia> ols = lm(@formula(Y ~ 0 + X), data)
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ +X

Coefficients:
     Estimate Std.Error t value Pr(>|t|)
X     2.35714 0.0874818 26.9444   0.0014


julia> ols = lm(x, y) # regression without the intercept term
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Array{Int64,1}, ::Array{Int64,1}, ::Bool)
Closest candidates are:
  fit(::Type{StatsBase.Histogram}, ::Any...; kwargs...) at C:\Users\leejm516\.julia\packages\StatsBase\56Djy\src\hist.jl:319
  fit(::StatsBase.StatisticalModel, ::Any...) at C:\Users\leejm516\.julia\packages\StatsBase\56Djy\src\statmodels.jl:151
  fit(::Type{D<:Distributions.Distribution}, ::Any...) where D<:Distributions.Distribution at C:\Users\leejm516\.julia\packages\Distributions\WHjOk\src\genericfit.jl:34
  ...
Stacktrace:
 [1] lm(::Array{Int64,1}, ::Array{Int64,1}, ::Bool) at C:\Users\leejm516\.julia\packages\GLM\0c65q\src\lm.jl:146 (repeats 2 times)
 [2] top-level scope at none:0

julia> ols = lm([ones(3) x], y) # regression with the intercept term
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
      Estimate Std.Error   t value Pr(>|t|)
x1   -0.333333   0.62361 -0.534522   0.6875
x2         2.5  0.288675   8.66025   0.0732


julia>
1 Like

You just have to make x a matrix.

julia> using GLM; x = [1,2,3]; y = [2,5,7];

julia> lm(reshape(x, length(x), 1), y)
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
     Estimate Std.Error t value Pr(>|t|)
x1    2.35714 0.0874818 26.9444   0.0014
3 Likes

We should probably add a convenience function for this, as it keeps coming up.

4 Likes

Yeah… what a silly question :slight_smile: Thank you for kind replying.

The next step is missing for me:

ops = lm(reshape(x, length(x), 1), y)
y_linreg_fitted = coef(ops) .* x

Or
fitted(ops)

How do you do this for logistic regression?

Same thing?

julia> using GLM

julia> df = (x = rand(10), y = rand(Bool, 10));

julia> glm(@formula(y ~ 0 + x), df, Bernoulli(), LogitLink())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Bernoulli{Float64}, LogitLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 0 + x

Coefficients:
──────────────────────────────────────────────────────────────
      Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────
x  -1.32166     1.47523  -0.90    0.3703   -4.21306    1.56975
──────────────────────────────────────────────────────────────

julia> glm(reshape(df.x, 10, 1), df.y, Bernoulli(), LogitLink())
GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Bernoulli{Float64}, LogitLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
───────────────────────────────────────────────────────────────
       Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────
x1  -1.32166     1.47523  -0.90    0.3703   -4.21306    1.56975
───────────────────────────────────────────────────────────────
2 Likes