Iβm assuming OP is using GLM with the additional poly function described in the docs here.
Do I understand the question correctly as being about whether GLM.jl can replicate the coefficient values in given on the certification page you are linking? If so it appears it canβt:
julia> lm(@formula(y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 + x^7 + x^8 + x^9 + x^10), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + x + :(x ^ 2) + :(x ^ 3) + :(x ^ 4) + :(x ^ 5) + :(x ^ 6) + :(x ^ 7) + :(x ^ 8) + :(x ^ 9) + :(x ^ 10)
Coefficients:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
(Intercept) 0.0 NaN NaN NaN NaN NaN
x 0.0 NaN NaN NaN NaN NaN
x ^ 2 0.0 NaN NaN NaN NaN NaN
x ^ 3 0.0 NaN NaN NaN NaN NaN
x ^ 4 0.0 NaN NaN NaN NaN NaN
x ^ 5 0.0 NaN NaN NaN NaN NaN
x ^ 6 0.00368644 0.000219028 16.83 <1e-26 0.0032503 0.00412258
x ^ 7 0.00191731 0.000127438 15.05 <1e-23 0.00166355 0.00217107
x ^ 8 0.000375885 2.75293e-5 13.65 <1e-21 0.000321067 0.000430703
x ^ 9 3.28191e-5 2.61667e-6 12.54 <1e-19 2.76087e-5 3.80296e-5
x ^ 10 1.07467e-6 9.23624e-8 11.64 <1e-17 8.90753e-7 1.25859e-6
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
feeding the same data into R I get:
Call:
lm(formula = y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) + I(x^6) +
I(x^7) + I(x^8) + I(x^9) + I(x^10), data = df)
Residuals:
Min 1Q Median 3Q Max
-0.0099087 -0.0024610 0.0003385 0.0020743 0.0071654
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.743e+02 8.756e+01 -1.990 0.050345 .
x -3.269e+02 1.480e+02 -2.208 0.030436 *
I(x^2) -2.661e+02 1.095e+02 -2.429 0.017617 *
I(x^3) -1.239e+02 4.652e+01 -2.664 0.009534 **
I(x^4) -3.638e+01 1.251e+01 -2.907 0.004845 **
I(x^5) -6.979e+00 2.211e+00 -3.156 0.002333 **
I(x^6) -8.747e-01 2.567e-01 -3.407 0.001079 **
I(x^7) -6.906e-02 1.890e-02 -3.654 0.000487 ***
I(x^8) -3.118e-03 8.009e-04 -3.894 0.000219 ***
I(x^9) -6.139e-05 1.489e-05 -4.123 9.91e-05 ***
I(x^10) NA NA NA NA
---
Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1
Residual standard error: 0.003768 on 72 degrees of freedom
Multiple R-squared: 0.9958, Adjusted R-squared: 0.9953
F-statistic: 1895 on 9 and 72 DF, p-value: < 2.2e-16
which seems somewhat closer but not quite there.
Maybe the problem is nearly rank-deficient, in which case GLM might behave differently from other software, but there might not necessarily be a βcorrectβ approach to this issue always and everywhere. In this case it appears though that there is an βofficialβ correct solution, maybe @nalimilan is interested in having a look at whatβs happening here.
Fitting polynomials by solving for the coefficients of monomials is generally not a good idea. It generates poorly conditioned matrices and thus large errors in coefficients because the shapes of the curves x^n are very similar. Instead itβs better to express the polynomial as a linear combination of Chebyshev polynomials or Legendre polynomials, which are orthogonal to each other (in an inner product over the real interval and with proper weighting function, and at least strongly linearly independent when discretized).
This isnβt an esoteric problem. If youβre having getting good fits to a 10th order polynomial with order 10 data points, itβs likely your problem.
Not much point continuing the discussion without OP, but if I understand his post and the linked page correctly, this is a deliberately unstable problem which is used to validate statistical software against some known result. Now of course I donβt know how that known result has been derived and whether itβs in any way better than what R or GLM.jl give, but I think thereβs little point arguing with the problem formulation.
which are within the bounds given, on first glance. The ArnoldiFit is a better match to the given coefficients though:
julia> maximum(abs(pa(x) - q(x)) for x β range(0,1, length=1000))
7.275957614183426e-12
julia> maximum(abs(p(x) - q(x)) for x β range(0,1, length=1000))
0.00025997101602115436
The only thing I changed was that I had to use p = Polynomials.fit(x,y, 10) and then q = Polynomial(as) and when I did the maximum(abs(p(x) - q(x)) for x β range(0,1, length=1000)), the result gave me 0.0003723332774825394
I canβt tell you how grateful I am! thank you so so so much