[ANN] LinearRegressionKit

Do you have a reference? Is there a quantitative analysis somewhere? The condition number seems to be the same as Cholesky so at best it seems you would gain a small constant factor.

How is this different from QR?

Yes — in particular, since it squares the condition number, then it depends on the conditioning of your matrix. e.g. if your matrix has a condition number of 100, then it loses about 2 extra digits due to roundoff errors compared to QR, but if it has a condition number of 10^8 then you lose about 8 extra digits. See also this discussion: Efficient way of doing linear regression - #33 by stevengj

The reason why libraries often default to QR is that it can be hard for users to predict (or comprehend) whether their problem is well-conditioned, and reliability is prioritized over speed.

(I’ve also seen references to the ability of the sweep approach to incrementally add additional columns/variables, but of course you can update QR incrementally as well.)

2 Likes