[ANN] LinearRegressionKit

stevengj · November 19, 2022, 4:10pm

Do you have a reference? Is there a quantitative analysis somewhere? The condition number seems to be the same as Cholesky so at best it seems you would gain a small constant factor.

How is this different from QR?

Yes — in particular, since it squares the condition number, then it depends on the conditioning of your matrix. e.g. if your matrix has a condition number of 100, then it loses about 2 extra digits due to roundoff errors compared to QR, but if it has a condition number of 10^8 then you lose about 8 extra digits. See also this discussion: Efficient way of doing linear regression - #33 by stevengj

The reason why libraries often default to QR is that it can be hard for users to predict (or comprehend) whether their problem is well-conditioned, and reliability is prioritized over speed.

(I’ve also seen references to the ability of the sweep approach to incrementally add additional columns/variables, but of course you can update QR incrementally as well.)

Topic		Replies	Views
Efficient way of doing linear regression Performance regression	44	20610	February 7, 2022
Best way for linear regression problem on product features General Usage performance , regression , qr	28	304	April 11, 2025
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5092	November 26, 2018
Convert a (numerically) non-pd matrix into pd matrix General Usage	2	538	May 10, 2019
What is the equivalent Julia code for statsmodels.api.OLS Statistics python	15	467	March 26, 2024

[ANN] LinearRegressionKit

Related topics