[ANN] FastRandPCA - Fast Randomized PCA for Sparse Data

That’s not surprising. It’s basically the same as the reason the normal equations A^T Ax = A^Tb are the fastest way to solve least-squares problems, but people use QR or SVD instead to avoid squaring the condition number. See e.g. Efficient way of doing linear regression - #33 by stevengj and [ANN] LinearRegressionKit - #10 by stevengj

But, as you say, if one is only computing only a few of the biggest singular values then maybe this isn’t as big of an issue? Especially if your accuracy needs are low.

That shouldn’t be surprising — iterative methods are mostly advantageous for sparse problems (or other problems with fast matrix–vector multiply), not for dense matrices.

(Note that you should also be careful to compare the accuracies when comparing randomized SVD to something like svdsolve, since otherwise you may be comparing apples and oranges.)

It’s reasonable to promote to a higher precision in some cases (especially Float16), but what if the user gives you quad-precision data? Then you are downgrading the precision. (And if the data is complex it is completely wrong.)

For example, if you wanted to promote to at least single precision, you could do promote_type(Float32, eltype(A)).

(Float16 is both slow and limited range (< 10^5), However, Float32 is also pretty useful for computation (it doesn’t overflow until 3.4e38) and fast, and it’s a shame not to allow it.)

2 Likes