The main point is that OLS on a singular system is not well defined. As I have told you numerous times. The coefficients are not estimable. Entirely arbitrary choices must be made to get some coefficients. I have described juliaâs choices, and pythonâs choices, and described how you can mimic pythonâs choices in julia. The coefficients you obtain are relative to these choices, and typically are not meaningful without analyzing the nature of the singularity (or multicollinearity which it is often called in this context). Though the model as such can be used for prediction.
And, of course, as always with floating point operations, when comparing results it must be done up to a precision, there are no meaningful exact comparisons of floating point numbers, except in some very simple cases. The problem at hand, computing a pseudo inverse, involves inverting singular values which are not too small, i.e. one must choose what âtoo smallâ means, this will have an impact on the pseudoinverse, and consequently on the computed coefficients. Julia and python may have slightly different parameters for computing the pseudoinverse, and even slightly different ways to compute the singular value decomposition. I donât know.
Also, in the python-mimicry I multiplied the covariance matrix with Xc' * y
to find the coefficients. Because I needed the covariance matrix anyway to find the standard errors. It is more common to solve the system Xc' * Xc * x = Xc' * y
for x
, which is mathematically the same, but typically uses a different method, e.g. a pivoted Cholesky decomposition or qr or whatnot. This will yield slightly different estimates, i.e. in the 17th digit or so. What exactly python does, I donât know. Fortunately, it doesnât matter, because of the following.
If your program relies on exact floating point values which varies with more or less arbitrary choices made by some package, you are doing it the wrong way.
Now, what you are computing is known as ât-scoreâ or ât-valueâ, how many standard errors your estimate is away from zero. This is a sensible thing to compute, and is used to judge whether the results are statistically significant. When the t-score is small (how small depends on your application), the coefficient is indistinguishable from zero with the data you have. The difference between 0.001
and -0.001
, with a standard error around 1
is simply completely irrelevant.