When to use SVD and when to use Eigendecomposition for PCA?


#1

As a long time R user I was always under the impression that SVD is superior to and eigenvalue decomposition of the correlation matrix (see the documentation for prcomp and princomp) in terms of numerical accuracy and computational speed. The implementation of PCA in MultivariateStats.jl uses SVD if the the number of dimensions is smaller than the number of observations.

What are the computational trade-offs of SVD and eigenvalue decomposition of the covariance matrix?

edit: the question was unclear, rephrased it.


#2

A reasonible answer can be found on this page https://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca. The differences should not be significant (unless there are some implementation hacks…) In practice, you can try both and see what works, as there are matrices where calculating the covariances causes loss of precision…


#3

I don’t think this is a Julia-specific question. But note that when

A = U\Sigma V^\top

we get

A^\top A = V \Sigma^2 V^\top

so the two are very closely related, especially if A itself is symmetric.

I am not sure what this means, can you be more specific, eg link the relevant section?


#4

I put the question the wrong way, sorry.

I am well aware about the relation between the two of them. My question was about the computational aspects, i.e. numerical accuracy and computation time.

In R:

?prcomp

says:


Details:
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using ‘eigen’
on the covariance matrix. This is generally the preferred method
for numerical accuracy.

SVD also gives the predictions of the training values for free. So I was always under the impression, that SVD is superior. Except for very large data, because it is easy to construct a covariance matrix incrementally.


#5

In exact arithmetic (no rounding errors etc), the SVD of A is equivalent to computing the eigenvalues and eigenvectors of AᵀA. However, computing the “covariance” matrix AᵀA squares the condition number, i.e. it doubles the number of digits that you lose to roundoff errors. Of course, if you start with 15 digits and your matrix has a condition number of 10², so that you are only losing about 2 digits, then squaring the condition number to 10⁴ still leaves you with 10+ digits. But if your matrix is badly conditioned then you could have a big problem, especially if you have inaccurate data to start with. That’s why SVD methods typically work by bidiagonalization or similar methods that avoid forming AᵀA explicitly.

However, for PCA my understanding is that there is a saving grace, because in PCA you are usually only interested in a few largest singular values. In particular (covered in e.g. lecture 31 of Trefethen and Bau Numerical Linear Algebra), computing the k-th singular value σₖ via AᵀA increases the errors (compared to bidiagonalization-based SVD) by a factor of ≈ σ₁/σₖ where σ₁ is the largest singular value. In consequence, as long as you are only interested in singular values and vectors close in magnitude to σ₁, as is typical in PCA, then working with the covariance matrix is not so bad.

See also the discussion in https://github.com/JuliaLang/julia/pull/21701 regarding the svds function, which is typically used to compute only a few largest singular values (like for PCA) and hence the AᵀA method is attractive for computational efficiency despite the potential accuracy loss.


#6

I used TSVD.jl for text analysis and seems to be working well …

julia> using SparseArrays, Arpack, TSVD
       using BenchmarkTools
       dims = 15
       A = sprand(1000,1000,0.1)
       @btime svds(A, nsv=dims);
       @btime tsvd(A, dims);
  59.119 ms (1096 allocations: 922.22 KiB)
  245.278 ms (4002 allocations: 42.94 MiB)

svds seems quite performant so a good choice however Arpack does not build on julia 1.0 (tests above are done on 1.2-DEV.166)