OLS: difference between \ and normal equation (and inv or pinv)

Hello, I am hoping to reference this in an upcoming article on JuliaActuary. Is there something fundamentally different in the above code vs ols(x,y) = x \ y that would make this syntactically shorter version able to be used but without distorting the message?

It seems to me ols(x,y) = inv(x'x)x'y is a bit questionable example to use. I don’t think compilers are smart enough to figure out that this really should be implemented as ols(x,y) = (x'x)\x'y, or even better with something like QR without forming the normal matrix. It looks neat, but it really isn’t.


is there another succinct example that you think captures the same ideas but doesn’t have the issues that you point out?

I think this is neat:

n = 13;
x = rand(n); # independent variable
y = 6 * x + 0.71 * (rand(n) .- 0.5)/2; # fake data: note the slope of 6, random noise
A = [x ones(n)]; # coeff matrix of system of too many equations for linear fit
A \ y # solve for the fit parameters using least squares; note the slope ~ 6
2-element Vector{Float64}:

:slight_smile: Julia is clever.

I wonder if that could be figured out though? The adjoint is already lazy, \ has methods for QR or LU factorizations, and there is LazyArrays.jl. Maybe it’s not so far-fetched to make A'* and inv lazy, so that the pseudo-inverse is automatically recognized and does the right thing. It seems like inv(A'A)*, and maybe the whole pattern (x'x)\x' is lazy-able, as well as right pseudo-inverse. That laziness might do operations in place, so you don’t have to muck up the beautiful * syntax with mul!. (There’s probably already an old thread on this in discourse, but I couldn’t find it.)

In teaching, I’ve taken to avoid writing A^{-1}b on the whiteboard, because when students implement it in code they always do inv. So I just write \ as if it’s math (after checking there are no real mathematicians around). I think Matlab’s introduction of \ and overloading it with pseudo-inverse was a true stroke of genius. But assuming the world isn’t going to adopt \, maybe Julia can cleverly do the right thing with inv(A)*b and pseudo-inverses.


So this one has the advantage of being smarter about the operations due to multiple dispatch than the original? Is that because the two (potentially different) types are called with a single method instead of intermediate methods like inv, which operates only on the first argument/type first?

The intention was not to provide a fully optimized form of least squares, which would indeed look a lot more involved. Ideally, you should first compute the LU or QR decomposition, for example, followed by ldiv!, and everything should also be done in-place.

The point was just to provide a simple example where you get lots of specialization for free. This is in fact the case with ols(x,y) = inv(x'x)x'y, as can be seen by simply observing the dozens of specializations provided by LinearAlgebra for inv, adjoint, and *.

To answer @Alec_Loudenback’s original question, I see no reason why you couldn’t use the shorter form \ and make the same point. I used the slightly longer form because it involved more than one operation, which I thought might be more illustrative.


I get it. I just cringe when I see inv in implementation (not in pseudocode).
So how about ols(x, y) = pinv([x ones(length(x))]) * y?


Yep, that works too!

I also like the added bonus of not needing to pre-pad x with the intercept column :+1:.


Depending on the goals of your article, it might make sense to instead show a little bit more about factorizations/etc. which are connected to the multiple dispatch in this case. These are at the heart of all of these operations, especially if you think “users” would care about big problems or end up solving least squares on the same matrix with different RHSs.

See Numerical Linear Algebra and Factorizations – Quantitative Economics with Julia for some stuff on these factorizations, and Krylov Methods and Matrix Conditioning – Quantitative Economics with Julia

There are no other languages like julia for making using more “advanced” algorithms accessible. And advanced is in the eye of the beholder. To many people, the idea that you would solve least squares without manually forming the normal euqations is tough to imagine (as is the subtlety of why it is slower and/or could have catastrophic numerical error is another issue entirely). By using good packages and algorithms, people can stay more ignorant of these issues.

But, as I said, it all depends on what your goals are. If this is narrowly talking about multiple dispatch in that situation, then I suggest showing the OLS with a sparse vs. a dense matrix.


Thanks, all. Seems like the original example is a bit suboptimal, not because of multiple dispatch, but because there are better alternatives to calling inv for this problem which are more directly performing the optimized maths.