I’m trying to get the first 50 dominant eigenvectors of a huge symmetric matrix, of dimension 65000 x 65000. I’m doing this on a computer cluster, so there’s no RAM limitation. But it still takes forever. In fact I don’t even know how long – I let it run for 2 days and killed the job because it hadn’t finished yet. The code is simple:

ev = eigvecs(eigen(Symmetric(P), 64950:65000))

Where P is an {Int64} matrix.

Is there a faster function besides eigen() that I should be using here? This was the only one I could find that allowed me to specify the K largest (or smallest) eigenvectors, but perhaps others do. This also doesn’t seem to be running parallel, which would otherwise speed things up massively as well, as I likewise could split it across hundreds of cores if there is a way to parallelize.

I guess it is a sparse matrix. In any case, have you considered iterative methods? I use a lot KrylovKit.jl for computing egen elements. You also have IterativeSolver.jl

Yes – I only want the first 50 eigenvectors with the largest corresponding eigenvalues. I’ve never heard of block power iterations before, is there a Julia routine for it?

And you’re correct – the matrix is not sparse. It’s a distance matrix, with zeros on the diagonal, but non-zero (strictly positive) everywhere else.

You might as well use restarted Lanczos or some other Krylov algorithm, no? That should be strictly better than block-power iterations.

Krylov methods aren’t limited to sparse matrices — you just need a way to multiply matrices by vectors. If you have a fast way to multiply matrices by vectors (either because of sparsity or some other structure, e.g. using fast-multipole methods), then they are faster, but you can also use dense matrices, especially if you want the largest-|λ| eigenvalues so that you don’t need shift-and-invert.

Where are you getting such a large dense matrix from? Maybe it has some other structure you can exploit. (At the very least you can parallelize the matrix-vector multiplication.)

This is a phylogenetic distance matrix for 60,000 animal species. The goal is to find the eigenvectors that best explain the phylogenetic dissimilarity among species. Unfortunately it’s inherently dense.

It might be possible to form an approximate “compressed” representation of this matrix that you can multiply by quickly. For example, a lot of distance-like matrices have (approximate) hierarchical low rank structure or other combinations of low-rank structures.

It might also be worth trying this calculation with MKL.jl since MKL has a really really fast reduction to symmetric tridiagonal form. Particularly, if you happen to have a machine with many cores.

as @stevengj suggests, your matrix may still have a lot of structure despite being dense. I have a package for H-matrices that implements a particularly simple hierarchical format called HODLR (“hierarchically off-diagonal low rank”). This of course only helps if there is structure to be exploited, and if you’re interested I (or somebody else) can expand on common sources of that structure.

On my machine (an intel i5-11600K with a completely broken BLAS-3 until openblas 3.14 comes to Fedora), in the case of a matrix that very much does have that structure, I can assemble a 60k square matrix in about 130ms. And for matrices that would actually fit in RAM, you could check that the matvec accuracy is quite good. Probably not quite tol (see below), but probably better than 10*tol. Here’s an example:

using KernelMatrices, KernelMatrices.HODLR, BenchmarkTools
# this could also have the syntax kernel_entry(j, k, parameters), or
# kernel_function(ptj, ptk, parameters), where you'd construct K with a vector
# of points instead of the integer indices.
function kernel_entry(j,k)
exp(-abs(j-k)/100)
end
K = KernelMatrix(1:60_000, 1:60_000, kernel_entry)
tol = 1e-14 # ACA fact precision
maxrank = 0 # maximum rank
level = HODLR.LogLevel(8) # log2(n) - 8 dyadic 2x2 splittings
# 130 ms with a single thread on my computer. Not too bad.
# if you want a direct solve, you can also factorize this matrix,
# and assuming that the off-diagonal block rank is bounded then the
# factorization and solve will be O(n log^2 n).
@btime HK = KernelHODLR($K, $tol, $maxrank, $level, nystrom=false)

The package really depends on being able to conveniently get individual elements of your matrix, though. So if you have something that looks like \textbf{X} \textbf{X}^T or something, unless \textbf{X} is pretty skinny then this might not be super helpful. But maybe something to look into.