This linear regression fails in Julia (GLM) vs Python (sklearn)

I have not figured out why the Julia and Python implementations yield different results when performing this “simple” linear regression with the following data. While sklearn is consistent with Excel, GLM fails to perform the regression and returns a zero value for the slope.

  • Julia
using DataFrames, GLM

x = [6.20E-13, 5.83E-11, 1.06E-10, 1.70E-10, 2.64E-10, 3.79E-10, 4.51E-10,
     6.25E-10, 6.92E-10, 5.45E-10, 8.22E-10, 1.11E-09, 1.41E-09]
y = [0, 0.009836193, 0.017599565, 0.025362938, 0.040889682, 0.056416427, 0.071943172,
     0.087469917, 0.102996662, 0.098084522, 0.137400303, 0.175161579, 0.214743511]
data = DataFrame(x=x, y=y)
model = lm(@formula(y ~ x), data)
println(model)
  • Python
import numpy as np
from sklearn.linear_model import LinearRegression

x = np.array([6.20E-13, 5.83E-11, 1.06E-10, 1.70E-10, 2.64E-10, 3.79E-10, 4.51E-10,
              6.25E-10, 6.92E-10, 5.45E-10, 8.22E-10, 1.11E-09, 1.41E-09]).reshape(-1, 1)
y = np.array([0, 0.009836193, 0.017599565, 0.025362938, 0.040889682, 0.056416427, 0.071943172,
              0.087469917, 0.102996662, 0.098084522, 0.137400303, 0.175161579, 0.214743511])

model = LinearRegression()
model.fit(x, y)
print(model.coef_)
print(model.intercept_)

Any insights are appreciated!

Note that GLM yields a non-zero slope when the axes are reversed and the results are consistent when compared with Python (or Excel).

using GitHub - JuliaAI/MLJLinearModels.jl: Generalized Linear Regressions Models (penalized regressions, robust regressions, ...) gives same result as sklearn.

2 Likes

Probably something like this here: PosDefException with simple linear regression when x/y ranges are small · Issue #375 · JuliaStats/GLM.jl · GitHub i.e. a precision issue in Cholesky. Consider:

julia> lm(@formula(y ~ x), (y = y, x = x .* 1e10))
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                   Coef.  Std. Error      t  Pr(>|t|)    Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)  0.000773609  0.00275744   0.28    0.7843  -0.00529547  0.00684268
x            0.0154962    0.00042306  36.63    <1e-12   0.014565    0.0164273
──────────────────────────────────────────────────────────────────────────────

which is basically the same as:

julia> [ones(length(x)) x]\y
2-element Vector{Float64}:
 0.0007736085408234451
 1.549615493582457e8

once you account for the scaling of x.

EDITED to add: you can go all the way up to 1e2 here:

julia> lm(@formula(y ~ x), (y = y, x = x .* 1e2))
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x

Coefficients:
──────────────────────────────────────────────────────────────────────────────────
                   Coef.      Std. Error      t  Pr(>|t|)    Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept)  0.000773609      0.00275744   0.28    0.7843  -0.00529547  0.00684268
x            1.54962e6    42306.0         36.63    <1e-12   1.4565e6    1.64273e6
──────────────────────────────────────────────────────────────────────────────────
4 Likes

This looks like the GLM.jl issue Incorrect linear regression results · Issue #426 · JuliaStats/GLM.jl · GitHub. You can get the sklearn result by calling lm(@formula(y ~ x), data, dropcollinear=false).

3 Likes

I should have also said that QR decomposition is already available on GLM master:

julia> lm(@formula(y ~ x), (;y, x); method = :qr)
LinearModel

y ~ 1 + x

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                   Coef.  Std. Error      t  Pr(>|t|)    Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)  0.000773609  0.00275744   0.28    0.7843  -0.00529547  0.00684268
x            1.54962e8    4.2306e6    36.63    <1e-12   1.4565e8    1.64273e8
──────────────────────────────────────────────────────────────────────────────

I am getting this error message for this solution:

MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Float64}, ::Vector{Float64}, ::Nothing; method=:OLS)
Closest candidates are:
  fit(::Type{LinearModel}, ::AbstractMatrix{<:Real}, ::AbstractVector{<:Real}, ::Union{Nothing, Bool}; wts, dropcollinear) at C:\Users\santi\.julia\packages\GLM\6ycOV\src\lm.jl:134 got unsupported keyword argument "method"

I am using GLM v1.8.3 in Julia 1.9.1 – I updated GLM but I guess I am not using the latest version yet.

As I said this is on master only currently, it was planned for 2.0 but there is a backport PR for 1.x currently which isn’t merged yet. If you want to use it now you’ll have to add GLM#master

Got It, Thanks!

1 Like