Problems with simple linear model (R and Julia comparison)

slwu89 · March 21, 2025, 3:15pm

Hi all, I’m trying to reproduce some R code. Can someone help me figure out why the difference in these polynomial regressions is so different between Julia and R? Yes, a 9th order polynomial fit is not practical; I’m trying to reproduce a figure from an R-based textbook.

Julia:

using RDatasets, DataFrames
using GLM
mcycle = dataset("MASS", "mcycle")
lm = fit(LinearModel, @formula(Accel ~ Times + Times^2 + Times^3 + Times^4 + Times^5 + Times^6 + Times^7 + Times^8 + Times^9), mcycle)

Gives

Accel ~ 1 + Times + :(Times ^ 2) + :(Times ^ 3) + :(Times ^ 4) + :(Times ^ 5) + :(Times ^ 6) + :(Times ^ 7) + :(Times ^ 8) + :(Times ^ 9)

Coefficients:
────────────────────────────────────────────────────────────────────────────────────────
                    Coef.     Std. Error       t  Pr(>|t|)      Lower 95%      Upper 95%
────────────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.0          NaN            NaN       NaN     NaN            NaN
Times         0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 2     0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 3     0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 4     0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 5     0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 6     0.0          NaN            NaN       NaN     NaN            NaN
Times ^ 7     3.64577e-10    1.39406e-9     0.26    0.7941   -2.3934e-9      3.12255e-9
Times ^ 8    -1.3951e-11     5.54462e-11   -0.25    0.8017   -1.23645e-10    9.57425e-11
Times ^ 9     1.33146e-13    5.48143e-13    0.24    0.8085   -9.51289e-13    1.21758e-12
────────────────────────────────────────────────────────────────────────────────────────

Whereas in R we have:

> library(MASS)
> data(mcycle)
> lm(accel ~ poly(times, 9), data = mcycle)

Call:
lm(formula = accel ~ poly(times, 9), data = mcycle)

Coefficients:
    (Intercept)  poly(times, 9)1  poly(times, 9)2  poly(times, 9)3  
        -25.546          164.557          131.227         -239.790  
poly(times, 9)4  poly(times, 9)5  poly(times, 9)6  poly(times, 9)7  
         -6.738          245.799          -83.906         -153.596  
poly(times, 9)8  poly(times, 9)9  
        163.064           31.879

palday · March 21, 2025, 3:48pm

The model matrix is ill-conditioned (see all the issues related to QR vs Cholesky on the GLM.jl repository), but there is also another difference: poly in R by default gives orthogonal polynomials, not raw ones, so the estimates will be different. What happens if you do

lm(accel ~ poly(times, 9, raw=TRUE), data = mcycle)

in R?

palday · March 21, 2025, 3:59pm

Okay, with raw polynomials in R:

> summary(lm(accel ~ poly(times, 9, raw=TRUE), data=mcycle))

Call:
lm(formula = accel ~ poly(times, 9, raw = TRUE), data = mcycle)

Residuals:
    Min      1Q  Median      3Q     Max 
-92.250 -18.707  -0.545  19.571  54.739 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)  
(Intercept)                  3.467e+02  1.395e+02   2.486   0.0143 *
poly(times, 9, raw = TRUE)1 -2.288e+02  9.266e+01  -2.469   0.0149 *
poly(times, 9, raw = TRUE)2  4.963e+01  2.226e+01   2.230   0.0276 *
poly(times, 9, raw = TRUE)3 -4.811e+00  2.691e+00  -1.788   0.0762 .
poly(times, 9, raw = TRUE)4  2.271e-01  1.870e-01   1.214   0.2269  
poly(times, 9, raw = TRUE)5 -4.851e-03  7.926e-03  -0.612   0.5417  
poly(times, 9, raw = TRUE)6  1.025e-05  2.081e-04   0.049   0.9608  
poly(times, 9, raw = TRUE)7  1.460e-06  3.300e-06   0.442   0.6591  
poly(times, 9, raw = TRUE)8 -2.473e-08  2.892e-08  -0.855   0.3942  
poly(times, 9, raw = TRUE)9  1.283e-10  1.075e-10   1.194   0.2349

In Julia, if we disable rank deficiency checks:

julia> fit(LinearModel, @formula(Accel ~ Times + Times^2 + Times^3 + Times^4 + Times^5 + Times^6 + Times^7 + Times^8 + Times^9), mcycle, dropcollinear=false)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}

Accel ~ 1 + Times + :(Times ^ 2) + :(Times ^ 3) + :(Times ^ 4) + :(Times ^ 5) + :(Times ^ 6) + :(Times ^ 7) + :(Times ^ 8) + :(Times ^ 9)

Coefficients:
─────────────────────────────────────────────────────────────────────────────────────────
                     Coef.     Std. Error      t  Pr(>|t|)       Lower 95%      Upper 95%
─────────────────────────────────────────────────────────────────────────────────────────
(Intercept)   346.708       139.457         2.49    0.0143    70.6622       622.754
Times        -228.771        92.6506       -2.47    0.0149  -412.167        -45.3748
Times ^ 2      49.6301       22.2542        2.23    0.0276     5.57919       93.681
Times ^ 3      -4.81176       2.69039      -1.79    0.0762   -10.1372         0.513707
Times ^ 4       0.227133      0.186976      1.21    0.2268    -0.142975       0.59724
Times ^ 5      -0.0048522     0.00792551   -0.61    0.5415    -0.0205403      0.0108359
Times ^ 6       1.02888e-5    0.000208089   0.05    0.9606    -0.00040161     0.000422188
Times ^ 7       1.45897e-6    3.2998e-6     0.44    0.6592    -5.07279e-6     7.99072e-6
Times ^ 8      -2.47269e-8    2.8922e-8    -0.85    0.3942    -8.19762e-8     3.25224e-8
Times ^ 9       1.2828e-10    1.07472e-10   1.19    0.2349    -8.44551e-11    3.41015e-10
─────────────────────────────────────────────────────────────────────────────────────────

which is the same answer as in R.

slwu89 · March 21, 2025, 4:01pm

Ah thanks. Those differences are helpful to know. Because its so impractical in real use I never investigated the details of poly in R.

Topic		Replies	Views
GLM - inconsistency with R on ISLR dataset? Performance question , glm	2	571	January 30, 2022
Fitting a polynomial of degree 10 to data General Usage data , regression , polynomials	9	3395	November 4, 2021
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5187	November 26, 2018
Fail to do multiple linear regression using GLM Statistics regression , glm	7	1913	December 18, 2018
Creation of R formula with interpolated Julia Symbol (RCall) Statistics question	1	889	February 23, 2017

Problems with simple linear model (R and Julia comparison)

Related topics