Svdvals is inefficient for tall matrices

For a tall matrix X, calling svdvals(X) directly is noticeably slower than calling it on svdvals(X'X) despite the overhead of the extra matrix-matrix multiply:

using BenchmarkTools: @btime
using LinearAlgebra: svdvals
X = rand(ComplexF32, 1000,100)
f1(x) = svdvals(x) # tall
f2(x) = sqrt.(svdvals(x'x)) # small and square
@assert f2(X) ≈ f1(X) # yes, they match
@btime  f1($X)
@btime  f2($X)
versioninfo()

If this is consistent for tall matrices in general, should we build this behavior into svdvals directly? Or is the “squaring the condition number” concern the reason for keeping it as is?

Output:

  5.257 ms (9 allocations: 842.94 KiB)
  3.316 ms (12 allocations: 218.50 KiB)
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 80

Perhaps this question is somewhat related to

2 Likes

What you found is a well known trick to speed up the calculation of the SVD for substantially wide or tall matrices as described here. We also made use of this in this topic:

Maybe it is worth it to implement a fastsvd function into LinearAlgebra as well.

1 Like

If you make X'*X you want to compute a symmetric eigenvalue decomposition (eig(X'*X)) instead of an SVD of X'*X, that should be even faster because it knows that in that case there is only one instead of two unitary matrices involved. In principle this approach is less accurate.

A better approach is to to first do a thin QR decomposition of X and then doing an SVD of the square and small matrix R, such that you have
X = QR = (Q U) S V'

e.g., if you only want the singular values: f3(x) = svdvals(qr(X).R).

This is faster than directly svdvals(X) (which I find strange, I would think Lapack should be smart enough to do this itself), but slower than the appraoch based on X'*X. The QR decomposition is not that more costly as matrix multiplication X'*X, but I assume that the latter is just much more optimized.

4 Likes