Efficient way of doing linear regression

stevengj · January 14, 2020, 8:17pm

Sorry to come late to this thread, but I’m surprised that no one seems to have given the correct explanation here for the performance differences between various methods to solve \min_x \Vert Ax-b\Vert_2.

x̂ = (A'A)\(A'b) is the “normal equations” approach. (You could also use cholesky(Hermitian(A'A))\(A'b) to explicitly take advantage of the fact that A'A is SPD). This (and variations thereof) is the fastest approach, but it is also the least accurate. In particular, it squares the condition number of A, so it doubles the number of digits you lose to roundoff errors and similar. You should only use this method if you know that you have a well-conditioned A.
x̂ = A \ b is equivalent to qr(A, Val{true}()) \ b — it uses a pivoted QR factorization. This is slower than the normal-equations approach, but it is much more accurate for badly conditioned matrices because it doesn’t square the condition number.
x̂ = pinv(A) * b uses the SVD of A to apply the pseudo-inverse. This is the slowest method, but it gives you the most control over accuracy for ill-conditioned matrices because you can specify a tolerance in pinv to regularize the problem by dropping noisy singular values (though a very similar result can also be accomplished by Tikhonov regularization). Update: using the pseudo-inverse by an explicit call to pinv may not be backwards-stable (Liu & Barnett, 2016, and references therein) — it is better to use x̂ = svd(A) \ b, which computes \hat{x} = \hat{V} (\hat{\Sigma}^+ (\hat{U}^* b)), though you currently need to do the latter manually if you want to pass a tolerance.

QR is the default choice used for A \ b (in Julia and many other systems) because it is reliable without being too slow.

Topic		Replies	Views
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5091	November 26, 2018
Best way for linear regression problem on product features General Usage performance , regression , qr	28	297	April 11, 2025
[ANN] LinearRegressionKit Package Announcements package , regression , linear-regression	9	2668	November 22, 2022
What is the equivalent Julia code for statsmodels.api.OLS Statistics python	15	466	March 26, 2024
The simplest linear fit with GLM Tooling glm	13	5277	November 11, 2021

Efficient way of doing linear regression

Related topics