I am looking to solve the eigenvalues of sparse symmetric matrix of ~ 14000x14000.
Eigfact is too slow, and didn’t give results after an hour running. (8gb ram i7 processor)
Are there special iteritive methods?
One post mentions julia pro linking to MKL but I can’t imagine it would give the type of speedup I need.
I’m new to Julia and programming in general so let me know if I’m missing something obvious.
Do you really need all the eigenvalues? If so, why? (There might be a better approach to your ultimate goal.)
Running on N cores gives roughly an N-fold speedup with MKL, I believe.
@ simonbyrne eigs seems to actually be a little slower.
@ stevengj I don’t necessarily need all the eigenvalue but they have to be at leat the n smallest values. Is there a better function for this?
@ dpsanders, I’ll try with MKL and report if there is a significant speedup.
Thank for the answers!
I just noticed that running the MKL version htop is showing only 100% on 4 cores with low values on the others while without hkl I see 100% on all 8. The core i7 uses hyper-threading which is why I see 8 cores.
Any inight on this?
You could try simeaultanious inverse iteration with Rayleigh shifts.
Linear algebra like this usually doesn’t benefit from hyperthreading; it’s better to run full out on whatever fits in cache. MKL knows this but for some reason OpenBLAS defaults to using all logical cores. Top shows 100% even if some cores are waiting for food.
For small eigenvalues,
eigs has to factor a matrix at every iteration. If you are using a sparse matrix type, that is done on one core, and is expensive. 14000 is not too large, so you are better off converting to a full matrix, which is factored fast with BLAS.
Make sure your matrix is really symmetric (
A=(A+A')/2, if need be). The general algorithms are much more expensive. Even
eigfact should finish in less than 30 min for a dense symmetric matrix of rank 14000.
n is much smaller than the size of the matrix, as is typical in many applications, this can be computed much more efficiently than computing all the eigenvalues.
If you only want a few eigenvalues, I’m skeptical that it is advantageous to convert a matrix of that size to dense.
For the smallest-magnitude eigenvalues, it should be able to re-use the same (sparse) factorization over and over again—
eigs does not need to re-factorize on every iteration—in which case solving for a few eigenvalues should be vastly faster than a dense solve.
Indeed, I just checked that this is what
eigs seems to do, and it appears to be reasonably efficient for sparse problems. I can solve for the 20 smallest-magnitude eigenvalues of a 100000x100000 real-symmetric matrix (coming from a 2d discretized Laplacian) in about 2 seconds on my laptop, orders of magnitude faster than a dense eigensolver would be even if I had enough memory (which I don’t — it would require more than 80GB of RAM). (Try the second example from this notebook, changing to
N=500 to increase the matrix size.)
My mistake, my recollection of the algorithm was confused.
What I meant to advocate was converting to dense and then calling
eigs, since the Krylov subspace schemes will indeed win for small
nev. I got 700s for sparse and 33s for dense 14000x14000 with
eigs (random symmetric structure, about 10% fill), vs. about 800s for a dense
eigfact. For very large and very sparse cases like your example, of course things will be as you show, especially with simple structure. My claim is just that when level 2 and 3 BLAS are practical, they often beat the sparse versions.
For the record, it seems this is particularly good performance because it’s a 2D sparsity pattern. With a modified version of your code for the first eigenvalue of the Laplacian, with about 21k dof in both cases, I get 0.09s in 2D, 1.29s in 3D (which is of course better than a dense eigensolver anyway). I seem to remember there’s an asymptotic difference in performance from the dimensionality of the problem.
10% fill is huge. There is usually no point in using a sparse solver if you have that many nonzero entries. (On the order of 10 nonzero entries per row is common in real sparse-matrix applications, which corresponds to < 0.1% fill for a 14000x14000 matrix.)
And sparse-direct solvers will generally perform poorly for random fill. The algorithms work best for sparsity that comes from local interactions on a mesh or grid.
1d grids are best, and 2d grids/meshes are still very good. For 3d grids, the scaling of sparse-direct methods is worse (this is a consequence of the difficulty of partitioning meshes in higher dimensions, essentially…see e.g. Davis’ book on sparse-direct methods). For very large 3d problems you usually have to switch to purely iterative methods (GMRES, CG, etc) and hope you can find a good preconditioner.
Distributed computing of x = A \ b
Wow thanks for all the suggestions. I have to say I wasn’t expecting such a good response on my first post here.
I actually found out that I had a dumb mistake in the code. I converted to a dense matrix to solve using eigfact,
and forgot to convert back to sparse to sparse for eigs. I have ~ .0003% filling so sparse is the way to go.
I can now solve the code with eigs:
@timev: 0.901941 seconds (10.80 k allocations: 4.219 MiB)
sorry correction more like .03% fill
for anyone curious here is the full comparison of @timev for eigs with sparse and dense matrices
0.901941 seconds (10.80 k allocations: 4.219 MiB)
elapsed time (ns): 901940886
bytes allocated: 4423584
pool allocs: 10789
non-pool GC allocs:1
malloc() calls: 6
194.259476 seconds (149.69 k allocations: 1.541 GiB, 0.00% gc time)
elapsed time (ns): 194259475636
gc time (ns): 1994472
bytes allocated: 1655020183
pool allocs: 149651
non-pool GC allocs:30
malloc() calls: 5
GC pauses: 1