First, QR for least-squares is specialized to the problem — you apply QR to A, not to A^T A (which you never compute).
Second, while QR on A is generally slower than forming A^T A and doing Cholesky (the “normal equations” approach), the latter can be much less accurate when A is badly conditioned (i.e. the columns are nearly linearly dependent, i.e. A^T A is nearly singular). If you care about getting the right answer, and not just the fastest answer, I would use QR.
See also Efficient way of doing linear regression - #33 by stevengj