I am interested in the eigenspectrum of Hermitian matrices that have a “pentadiagonal” structure, with A[m,n]=0 if abs(m-n)>2. I am most interested in eigenstates with values close to zero. At present I am using Arpack.jl’s shift-invert functionality with SparseArrays. However, I was wondering if there is an efficient implementation for matrices with such bandwidths, similar to the built-in TriDiagonal etc, since along the 5 diagonals the matrices tend to be dense. My typical use case is matrix size \sim50000\times50000, and extracting O(100) eigenvectors.
— Edit —
I have looked into BandedMatrices, which supports the factorization needed for Arpack’s shift-invert. However, this gives me essentially the same runtime as using plain sparse matrices.
I can reproduce this — BandedMatrices.jl is 10x faster than SparseArrays at LU-factorizing a pentadiagonal matrix, but is actually slightly slower at using the LU factorization to solve for a particular right-hand side:
julia> using BandedMatrices, SparseArrays, LinearAlgebra, BenchmarkTools
julia> m = 50000; B = brand(m,m,2,2); S = SparseMatrixCSC(B);
julia> FB = @btime lu($B); FS = @btime lu($S);
1.622 ms (6 allocations: 3.09 MiB)
16.828 ms (86 allocations: 61.04 MiB)
julia> b = randn(m);
julia> @btime $FB \ $b; @btime $FS \ $b;
865.917 μs (3 allocations: 416.06 KiB)
561.125 μs (6 allocations: 832.12 KiB)
Since a shift-invert eigensolve creates the factorization only once but uses it to solve many right-hand-sides, you won’t see an advantage from BandedMatrices.jl for this application.
(I’m surprised that the greater regularity of BandedMatrices.jl doesn’t allow the LU backsolve to be faster too. cc @dlfivefifty)
I tried using AppleAccelerate (on an M4), and it didn’t make much difference in the benchmark above. Backsolve is still about 30% slower than with a SparseMatrix.
Note both LAPack and BLAS have their own triangular solvers. It’s possible to use a custom ldiv! to use BLAS instead of LAPack in which case we are the same speed:
julia> using AppleAccelerate, BandedMatrices, SparseArrays, LinearAlgebra
julia> n = 1000; A = brand(n,n,2,2); F = lu(A); x = randn(n);
julia> function blas_ldiv!(F, x)
permute!(x, F.p) # allocates....
ldiv!(UnitLowerTriangular(F.factors), x)
ldiv!(UpperTriangular(F.factors), x)
x
end
blas_ldiv! (generic function with 2 methods)
julia> y = copy(x); @btime blas_ldiv!(F, y);
12.292 μs (6 allocations: 15.88 KiB)
julia> y = copy(x); @btime ldiv!(F, y);
22.916 μs (0 allocations: 0 bytes)
julia> F_s = lu(sparse(A));
julia> y = copy(x); @btime ldiv!(F_s, y);
13.042 μs (3 allocations: 7.94 KiB)
It’s possible that what LAPack does to avoid allocations is making it slower?