[ANN] ArnoldiMethod.jl v0.4

Hi all, I’ve release ArnoldiMethod v0.3.5 and v0.4.0 recently, after a relatively long break from writing Julia code.

See GitHub - JuliaLinearAlgebra/ArnoldiMethod.jl: The Arnoldi Method with Krylov-Schur restart, natively in Julia.

ArnoldiMethod.jl can be used to compute eigenvalues and eigenvectors (or Schur vectors) of large and sparse matrices. If you’ve used MATLAB, you may know the eigs function, which does something similar.

The package implements the Krylov-Schur algorithm, which is a slight improvement over the Implicitly Restarted Arnoldi Method you may know from ARPACK and MATLAB’s eigs function.

The ArnoldiMethod.jl package should be much more stable than Arpack.jl, and slightly more stable than KrylovKit.jl’s eigsolve / schursolve functions. It should also have better performance than KrylovKit.jl (mostly because it pre-allocates and works in-place as much as possible). If not, open an issue.

Finally, ArnoldiMethod.jl is I believe the only implementation of a partial Schur decomposition where everything is implemented natively in Julia, meaning that it works on arbitrary number types (I’ve only tested DoubeFloats.jl, BigFloat).

There are no algorithmic differences between v0.4 and v0.3. It’s just that v0.4 exports fewer names.

What’s new in both v0.3 and v0.4:

  • You can provide an initial guess partialschur(A; v1=...), which is the vector that induces the Krylov subspace.
  • You can pre-allocate large arrays and temporaries using the ArnoldiWorkspace struct, and run the algorithm in-place using partialschur!
  • You can reuse an existing partial Schur decomposition in an ArnoldiWorkspace and continue to compute additional eigenvalues and Schur vectors.
  • Stability of restart is improved: converged eigenvectors with eigenvalues far away from the target are properly purged (= removed from the Krylov subspace) so that it’s unlikely the algorithm has to do computations on an array with relatively very big and really tiny numbers, which otherwise leads to loss of precision. Also, in one crucial bit where large and tiny numbers are expected, Given’s rotations are now used instead of Householder reflections.
  • Fixed a few bugs, mostly related to locking of converged vectors

What’s still not in v0.3 and v0.4 after all these years:

A utility function like eigs that directly returns eigenvalues and eigenvectors. Right now ArnoldiMethod.jl is still a two-step method: first compute a Schur decomposition, then transform it into an eigendecomposition. I’m open for PRs.

Give it a try!

30 Likes

Concerning the desirable extensions: for the “vibration” problem (symmetric generalized stencil, looking for smallest eigenvalues), the package GitHub - PetrKryslUCSD/VibrationGEPHelpers.jl: Generalized Eigenvalue Problem helper functions wraps ArnoldiMethod.jl (and other similar packages), and so provides nearly uniform interface similar to eigs.

4 Likes

I’ve been using this package with great success to compute the stationary distribution for markov transition matrices. As promised, it delivers a modest performance gain relative to KrylovKit and a large gain relative to Arpack.

@stabbles, could you explain the extent to which it takes advantage of multithreading? Does it use the number of threads specified by Julia (e.g.,threads.nthreads())? …or does it use the the number of BLAS threads?

I find that it is not achieving very high rates of CPU utilization when doing the partial schur decomposition. My transition matrix that I’m working with is a sparse matrix of about 25,000 x 25,000 with about 60% fill.

I’m not @stabbles, but I took a look at the code and didn’t see any julia native use of threading (no use of the @threads or @spawn macros, and no dependence on other packages providing multithreading APIs). However, it uses LinearAlgebra.mul! and friends, which are multithreaded when used on array types that dispatch to BLAS (provided BLAS.get_num_threads() > 1).

That’s not a very sparse matrix. You’d probably have better luck using a regular dense Matrix.

3 Likes

Thanks. Very insightful. Unfortunately memory constraints preclude me from working with the transition matrix in dense form.

Any chance you can access a machine with more memory? Or use Float32? We’re talking about less than a factor of 2 here; sparse matrices are typically used when they cut down on memory by orders of magnitude and can be brutally slow for practically dense matrices like this.

Actually, I misstated the size and fill rate. Apollogies for that. Size is actually about 580,000 x 580,000. And the fill is less than 1%.

My eyeballing the fill from the spy() plot was apparently really inaccurate!

2 Likes

That makes more sense!

Most of the time should be taken by matrix-vector products for a problem of this size (profile to check). So the challenge is to make sure that the full memory bandwidth is being used; one often needs multiple threads to get that. Try the MKL sparse routines.

A GPU sparse matrix library might provide dramatic speed improvements.
For the size you mention, you might need a system with multiple GPUs and a GPU wizard to help design a partitioned product method for your matrix structure. It would be interesting to verify that ArnoldiMethod can take advantage of such things.

Does this package work with arrays on GPU?