# Svdvals is inefficient for tall matrices

For a tall matrix X, calling `svdvals(X)` directly is noticeably slower than calling it on `svdvals(X'X)` despite the overhead of the extra matrix-matrix multiply:

``````using BenchmarkTools: @btime
using LinearAlgebra: svdvals
X = rand(ComplexF32, 1000,100)
f1(x) = svdvals(x) # tall
f2(x) = sqrt.(svdvals(x'x)) # small and square
@assert f2(X) ≈ f1(X) # yes, they match
@btime  f1(\$X)
@btime  f2(\$X)
versioninfo()
``````

If this is consistent for tall matrices in general, should we build this behavior into `svdvals` directly? Or is the “squaring the condition number” concern the reason for keeping it as is?

Output:

``````  5.257 ms (9 allocations: 842.94 KiB)
3.316 ms (12 allocations: 218.50 KiB)
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)
Environment:
``````

Perhaps this question is somewhat related to

2 Likes

What you found is a well known trick to speed up the calculation of the SVD for substantially wide or tall matrices as described here. We also made use of this in this topic:

Maybe it is worth it to implement a `fastsvd` function into `LinearAlgebra` as well.

1 Like

If you make `X'*X` you want to compute a symmetric eigenvalue decomposition (`eig(X'*X)`) instead of an SVD of `X'*X`, that should be even faster because it knows that in that case there is only one instead of two unitary matrices involved. In principle this approach is less accurate.

A better approach is to to first do a thin QR decomposition of X and then doing an SVD of the square and small matrix R, such that you have
`X = QR = (Q U) S V'`

e.g., if you only want the singular values: `f3(x) = svdvals(qr(X).R)`.

This is faster than directly `svdvals(X)` (which I find strange, I would think Lapack should be smart enough to do this itself), but slower than the appraoch based on `X'*X`. The QR decomposition is not that more costly as matrix multiplication `X'*X`, but I assume that the latter is just much more optimized.

4 Likes