In exact arithmetic (no rounding errors etc), the SVD of A is equivalent to computing the eigenvalues and eigenvectors of AᵀA. However, computing the “covariance” matrix AᵀA squares the condition number, i.e. it doubles the number of digits that you lose to roundoff errors. Of course, if you start with 15 digits and your matrix has a condition number of 10², so that you are only losing about 2 digits, then squaring the condition number to 10⁴ still leaves you with 10+ digits. But if your matrix is badly conditioned then you could have a big problem, especially if you have inaccurate data to start with. That’s why SVD methods typically work by bidiagonalization or similar methods that avoid forming AᵀA explicitly.
However, for PCA my understanding is that there is a saving grace, because in PCA you are usually only interested in a few largest singular values. In particular (covered in e.g. lecture 31 of Trefethen and Bau Numerical Linear Algebra), computing the k-th singular value σₖ via AᵀA increases the errors (compared to bidiagonalization-based SVD) by a factor of ≈ σ₁/σₖ where σ₁ is the largest singular value. In consequence, as long as you are only interested in singular values and vectors close in magnitude to σ₁, as is typical in PCA, then working with the covariance matrix is not so bad.
See also the discussion in https://github.com/JuliaLang/julia/pull/21701 regarding the
svds function, which is typically used to compute only a few largest singular values (like for PCA) and hence the AᵀA method is attractive for computational efficiency despite the potential accuracy loss.