Efficient representation of a sparse "pentadiagonal" matrix?

I am interested in the eigenspectrum of Hermitian matrices that have a “pentadiagonal” structure, with A[m,n]=0 if abs(m-n)>2. I am most interested in eigenstates with values close to zero. At present I am using Arpack.jl’s shift-invert functionality with SparseArrays. However, I was wondering if there is an efficient implementation for matrices with such bandwidths, similar to the built-in TriDiagonal etc, since along the 5 diagonals the matrices tend to be dense. My typical use case is matrix size \sim50000\times50000, and extracting O(100) eigenvectors.

— Edit —

I have looked into BandedMatrices, which supports the factorization needed for Arpack’s shift-invert. However, this gives me essentially the same runtime as using plain sparse matrices.

BandedMatrices.jl?

Just edited :slight_smile:

I can reproduce this — BandedMatrices.jl is 10x faster than SparseArrays at LU-factorizing a pentadiagonal matrix, but is actually slightly slower at using the LU factorization to solve for a particular right-hand side:

julia> using BandedMatrices, SparseArrays, LinearAlgebra, BenchmarkTools

julia> m = 50000; B = brand(m,m,2,2); S = SparseMatrixCSC(B);

julia> FB = @btime lu($B); FS = @btime lu($S);
  1.622 ms (6 allocations: 3.09 MiB)
  16.828 ms (86 allocations: 61.04 MiB)

julia> b = randn(m);

julia> @btime $FB \ $b; @btime $FS \ $b;
  865.917 μs (3 allocations: 416.06 KiB)
  561.125 μs (6 allocations: 832.12 KiB)

Since a shift-invert eigensolve creates the factorization only once but uses it to solve many right-hand-sides, you won’t see an advantage from BandedMatrices.jl for this application.

(I’m surprised that the greater regularity of BandedMatrices.jl doesn’t allow the LU backsolve to be faster too. cc @dlfivefifty)

1 Like

OpenBLAS is very slow for banded matrices. Try using MKL or AppleAccelerate

1 Like

Which representations does MKL support? Does it dispatch on ‘BandedMatrix’, or how else would I specify the structure?

Just type using MKL and BandedMatrices.jl will use it

I tried using AppleAccelerate (on an M4), and it didn’t make much difference in the benchmark above. Backsolve is still about 30% slower than with a SparseMatrix.

It is just calling LAPACK’s gbtrs routine:

So I guess it’s LAPack that is slow?

Note both LAPack and BLAS have their own triangular solvers. It’s possible to use a custom ldiv! to use BLAS instead of LAPack in which case we are the same speed:

julia> using AppleAccelerate, BandedMatrices, SparseArrays, LinearAlgebra

julia> n = 1000; A = brand(n,n,2,2); F = lu(A); x = randn(n);

julia> function blas_ldiv!(F, x)
       permute!(x, F.p) # allocates....
       ldiv!(UnitLowerTriangular(F.factors), x)
       ldiv!(UpperTriangular(F.factors), x)
       x
       end
blas_ldiv! (generic function with 2 methods)

julia> y = copy(x); @btime blas_ldiv!(F, y);
  12.292 μs (6 allocations: 15.88 KiB)

julia> y = copy(x); @btime ldiv!(F, y);
  22.916 μs (0 allocations: 0 bytes)

julia> F_s =  lu(sparse(A));

julia> y = copy(x); @btime ldiv!(F_s, y);
  13.042 μs (3 allocations: 7.94 KiB)

It’s possible that what LAPack does to avoid allocations is making it slower?