Smallest magnitude eigenvalues of the generalized eigenvalue equation for a large sparse matrix

I have a generalized eigenvalue problem Hx=\lambda S x to solve, where H is Hermitian and S is positive definite. Both H and S are sparse matrices and their dimensions (size(H, 1)) are on the order of 10^5. Roughly 1% of the elements of H and S are nonzero. A few eigenvalues and eigenvectors are needed around a specific number \mu.

My current computing resources include nodes of 128 cores and 256 GB memory, and I can use as many as 8 nodes. I have the following environment settings:

export JULIA_NUM_THREADS=128
export OMP_NUM_THREADS=128

and the relevant code is

eigs(H - Ī¼ * S, S, which=:SM, nev=6)

With eigs function from Arpack.jl, the computation looks completely serial (the CPU usage is 100% from top command). However, if I do eigs(H - Ī¼ * S, S, nev=6), the CPU usage is roughly 2800%, but thatā€™s not what I want, because the default behavior is finding the eigenvalues of the largest magnitude (which=:LM).

Right now, I havenā€™t been able to successfully obtain the eigenvalues (closest to \mu) and eigenvectors even once with the current computing resources. Any general suggestions to solve such a problem is very helpful, including

  1. trying out other packages,
  2. fixing the weird behavior of using only 1 core if eigs is instructed to find the eigenvalues of the smallest magnitude. Even for finding the eigenvalues of the largest magnitude, eigs is still not using all my cores (128 in total).

Check Home Ā· ArnoldiMethod.jl

1 Like

eigs is a good tool, but use the argument explicittransform=:none. The default for this is definitely not what you want. It makes the computation expensive and potentially non-convergent.

1 Like

Thanks.

Can you explain a bit more, as there is little documentation about explicittransform argument?

In addition, I just tried to use explicittransform=:none but still only one core is used in the computation.

Refer to this issue: https://github.com/JuliaLinearAlgebra/Arpack.jl/issues/130
The use of threads in Arpack is not something I know a whole of about. Sorry.

1 Like

I now have some understanding of this problem.

To calculate the eigenvalues of the smallest amplitude, the underlying routine has to do shift and invert transformation. This transformation uses factorize in LinearAlgebra, which cannot use multithreading and takes a lot of memory.

My current understanding is that there is no good way to solve my problem, since inversion of a sparse matrix might be a dense matrix. Using other solvers, such as ArnoldiMethod.jl or IterativeSolvers.jl wonā€™t actually help.

Please correct me if I am wrong. If there are still ideas to help solve my problem, I would be more than happy to know.

If H and S are real, I can say from experience that Arpack works very well. No experience with complex matrices though. For instance, https://github.com/PetrKryslUCSD/FinEtoolsFlexStructures.jl/blob/110db0474c6e9c7e97d17d01eb215c4edce9fde3/examples/shells/dynamics/dcbs_vibration_examples.jl#L87

You have a way if the advice from @PetrKryslUCSD does not help you, It is to use pre-conditioners, although I have never tested them in the context of GEV but only in the context of EV, for example. You can define a function which is the inverse of A computed with preconditionned iterative linear solver and feed it to most EV, (GEV?). See also this example from ArnoldiMethod.jl as said by @mohamed82008

You could also check KrylovKit for another GEV.

If you dont have a preconditioner, you can use IncompleteLU.jl

If mu is inside the spectrum, this is a sparse interior eigenvalue problem, which is in general a hard problem. Things to look at include shift-inverse methods with iterative solves and preconditioners, spectrum folding (look for the smallest eigenvalue of (H-mu)^2), or sparse direct solvers. The relative performance of all these depends a lot on your problem, like the availability of good preconditioners, spectrum distribution, dimensionality of the underlying physical space if any, etc. From your notation I would imagine itā€™s a quantum mechanical problem: whatā€™s the underlying system and discretization?

Does this work for you: https://gist.github.com/fgerick/9bc3272e79c63f77f141d5a2ba6a5dea ?

Thank you all for the suggestions. I have now successfully figured out how to solve this problem in parallel.

Before I present my program, let me respond to all the suggestions:

  1. @PetrKryslUCSD suggests to use eigs directly with explicittransform=:none. This actually works, although it took me 9 hours to diagonalize 6 eigenvalues and eigenvectors. This is in principle acceptable, but I am hoping for a quicker solution since I have a dozen of similar generalized eigenvalue equations to solve.

  2. @rveltz and @antoine-levitt mentioned preconditioner. Although I read the wikipedia on this topic, I still donā€™t fully understand how to use and choose preconditioners for this problem.

  3. @antoine-levitt suggested spectrum folding with (H - \mu)^2, but I am solving a H x =\lambda S x type problem and similar tricks will involve S^{-1}, which seems to be very slow.

  4. @antoine-levitt This is indeed a quantum mechanical problem. It is a non-interacting many-particle eigenstate problem in thousands of atoms. Therefore, the Hilbert space is very large. As I said, it is on the order of 10^5. I am looking for the highest occupied state.

My solution is still to use shift-and-invert transformation. The issue is that the default matrix factorization in julia only utilizes one core and is relatively slow in my case (I believe the 9 hours of eig is mostly spent in factorize). Therefore, I choose Pardiso.jl solver, which utilized all 128 cores. The relevant program is

using LinearAlgebra, Serialization
using Pardiso, Arpack, LinearMaps

function construct_linear_map(H, S)
    ps = MKLPardisoSolver()
    set_matrixtype!(ps, Pardiso.COMPLEX_HERM_INDEF)
    pardisoinit(ps)
    fix_iparm!(ps, :N)
    H_pardiso = get_matrix(ps, H, :N)
    b = rand(ComplexF64, size(H, 1))
    set_phase!(ps, Pardiso.ANALYSIS)
    pardiso(ps, H_pardiso, b)
    set_phase!(ps, Pardiso.NUM_FACT)
    pardiso(ps, H_pardiso, b)
    return (
        LinearMap{ComplexF64}(
            (y, x) -> begin
                set_phase!(ps, Pardiso.SOLVE_ITERATIVE_REFINE)
                pardiso(ps, y, H_pardiso, S * x)
            end,
            size(H, 1);
            ismutating=true
        ),
        ps
    )
end

lm, ps = construct_linear_map(H - Ī¼ * S, S)
Ī»s_inv, X = eigs(lm, which=:LM)
set_phase!(ps, Pardiso.RELEASE_ALL)
pardiso(ps)
Ī»s = 1 ./ Ī»s_inv

Now it only takes me roughly 5 minutes to get the eigenvalues and eigenvectors, which is actually much better than what I expected.

2 Likes

You donā€™t need to compute the inverse. You just need a sparse factorization (e.g. bkfact since your problem is Hermitian indefinite) to implement the action (linear operator) of the inverse acting on a vector.

In numerical linear algebra, always read x = A^{-1} b as ā€œsolve Ax = b by the best available method,ā€ not as ā€œcompute the inverse of A and then multiply it by bā€. Remember, you only need to provide a linear operator to eigs (or similar), not necessarily an explicit matrix.

Your matrix sounds like it isnā€™t very big (10^5 \times 10^5 is very moderate for sparse matrices), so this doesnā€™t seem like it should be a hard problem with the right algorithms.

3 Likes

Thanks, I think I understand this point now, but from the document of factorize, Julia is doing LDLt instead of Bunch-Kaufman for sparse Hermitian matrix.

Thatā€™s the part that I donā€™t fully understand. As I said in the previous post, eigs took me 9 hours to get the eigenvalues and eigenvectors (~64 GB memory was consumed). In addition, the time and memory ldlt takes increases very rapidly:

julia> @time ldlt(H[1:10000, 1:10000])
  8.826993 seconds (70 allocations: 960.709 MiB, 0.15% gc time)
SuiteSparse.CHOLMOD.Factor{ComplexF64}
type:    LDLt
method:  simplicial
maxnnz:  36191307
nnz:     7011017
success: true


julia> @time ldlt(H[1:20000, 1:20000])
 95.049171 seconds (82 allocations: 5.909 GiB, 0.08% gc time)
SuiteSparse.CHOLMOD.Factor{ComplexF64}
type:    LDLt
method:  simplicial
maxnnz:  244502628
nnz:     32896447
success: true

It is possible that serial SuiteSparse isnā€™t very good at this specific problem (Pardiso seems to be much quicker with 128 cores), but it seems that it becomes increasingly difficult to solve sparse matrices of a larger size.

Note that you can use any factorization you want ā€” you donā€™t have to rely on the default in factorize ā€” if you pass your own linear-operator object. (But bunchkaufman doesnā€™t support sparse matrices, my bad.) So you can try lu, MUMPS.jl, Pardiso.jl, etcetera.

In general, the efficiency of sparse-direct solvers depends a lot on the sparsity pattern.
Unfortunately, I now notice that you mentioned 1% sparsity, which is pretty large (ā‰ˆ 1000 nonzeros per row). This makes 10^5 \times 10^5 a much tougher problem than a similar-size matrix arising in e.g. a finite-element calculation.

When sparse-direct solvers run out of steam, the alternative is an iterative solver (ideally a preconditioned iterative solver). That is, you pass a linear operator to eigs, where the linear operator takes any vector b and solves Ax=b by calling KrylovKit.jl or IterativeSolvers.jl or similar. But getting these to work well can be a lot of effort, and you are still hampered by the slowness of working with such a low-sparsity matrix.

1 Like

Can you add some details? Is this like a tight-binding hamiltonian with relatively few degrees of freedom per atom, or like a finite element hamiltonian with lots? What is the dimension? What does the molecule/solid look like?

If there are relatively few occupied states (eg ~100) and the spectral radius is reasonable, you can use an iterative method to determine all occupied states (typically LOBPCG is a good choice; thatā€™s what I use). If the spectral radius is large, you can use a preconditioned iterative method (but you need a good preconditioner; again that depends on the specifics your problem). If there are a lot of occupied states, then thatā€™s the hard case. There is quite a bit of research in finding good eigensolvers for quantum mechanical problems, so Iā€™d take a look at that.

1 Like

We have worked on that problem a while ago, and the results appeared encouraging. Implementing the algorithm at that time was somewhat cumbersome because we needed to implement all the sparse matrix algebra, but maybe with the Julia ecosystem that takes not more than a few lines of code. I will try to revisit that if I have some time, or anyway that can be a starting point to find other implementations and methods.

A usage more dedicated to finding eigenvalues around Ī¼ is to pass Ī¼ as the keyword argument sigma rather than explicitly constructing H - Ī¼ * S:

eigs(H, S, sigma=Ī¼, nev=6)

I donā€™t think it will make a huge difference in performance, though.

1 Like

@antoine-levitt This is a two dimensional tight-binding Hamiltonian for a large-scale TMD structure. Unfortunately, I need roughly the 30000-30020th eigenvalues out of 117090 eigenvalues, so determining all occupied states is not really an option. As mentioned by @stevengj, the sparsity is quite low. A good impression of the matrices:

@lmiq Thanks for bringing the paper to my attention. Do I understand correctly your method does not actually solve the eigenvalue equation? I have already finished mean-field self-consistent calculation with a third-party software and is now trying to do some analysis, so I really need the eigenvalues and eigenvectors.

I already have a reasonably working solution. Good for me to have a rough idea how eigs works.

Correct, you can have the density matrix from there, but not the eigenvalues.

That assumes the density matrix is sparse (like linear scaling DFT methods), right? Which is not the case here presumably.

That is indeed a tricky case.

Have you considered Green function methods? Ie if youā€™re studying something like a local perturbation of a periodic system, express the Green function of the perturbed system as a function of the Green function of the periodic one (which you can compute in reciprocal space), and get the eigenvalues from there. The general idea is ā€œsolve then truncate rather than truncate then solveā€. I happen to be super interested in this type of problems right now (weā€™re finishing up a paper to compute resonances in this type of systems), so if you send me more details (in PM if necessary) Iā€™d be happy to take a closer look.