Efficient representation of a sparse "pentadiagonal" matrix?

bremez · August 20, 2025, 9:58pm

I am interested in the eigenspectrum of Hermitian matrices that have a “pentadiagonal” structure, with A[m,n]=0 if abs(m-n)>2. I am most interested in eigenstates with values close to zero. At present I am using Arpack.jl’s shift-invert functionality with SparseArrays. However, I was wondering if there is an efficient implementation for matrices with such bandwidths, similar to the built-in TriDiagonal etc, since along the 5 diagonals the matrices tend to be dense. My typical use case is matrix size \sim50000\times50000, and extracting O(100) eigenvectors.

— Edit —

I have looked into BandedMatrices, which supports the factorization needed for Arpack’s shift-invert. However, this gives me essentially the same runtime as using plain sparse matrices.

stevengj · August 20, 2025, 11:13pm

BandedMatrices.jl?

bremez · August 20, 2025, 11:13pm

Just edited

stevengj · August 20, 2025, 11:33pm

I can reproduce this — BandedMatrices.jl is 10x faster than SparseArrays at LU-factorizing a pentadiagonal matrix, but is actually slightly slower at using the LU factorization to solve for a particular right-hand side:

julia> using BandedMatrices, SparseArrays, LinearAlgebra, BenchmarkTools

julia> m = 50000; B = brand(m,m,2,2); S = SparseMatrixCSC(B);

julia> FB = @btime lu($B); FS = @btime lu($S);
  1.622 ms (6 allocations: 3.09 MiB)
  16.828 ms (86 allocations: 61.04 MiB)

julia> b = randn(m);

julia> @btime $FB \ $b; @btime $FS \ $b;
  865.917 μs (3 allocations: 416.06 KiB)
  561.125 μs (6 allocations: 832.12 KiB)

Since a shift-invert eigensolve creates the factorization only once but uses it to solve many right-hand-sides, you won’t see an advantage from BandedMatrices.jl for this application.

(I’m surprised that the greater regularity of BandedMatrices.jl doesn’t allow the LU backsolve to be faster too. cc @dlfivefifty)

dlfivefifty · August 21, 2025, 3:12am

OpenBLAS is very slow for banded matrices. Try using MKL or AppleAccelerate

bremez · August 21, 2025, 3:31am

Which representations does MKL support? Does it dispatch on ‘BandedMatrix’, or how else would I specify the structure?

dlfivefifty · August 21, 2025, 3:50am

Just type using MKL and BandedMatrices.jl will use it

stevengj · August 21, 2025, 5:44pm

I tried using AppleAccelerate (on an M4), and it didn’t make much difference in the benchmark above. Backsolve is still about 30% slower than with a SparseMatrix.

dlfivefifty · August 21, 2025, 6:53pm

It is just calling LAPACK’s gbtrs routine:

github.com/JuliaLinearAlgebra/BandedMatrices.jl

src/banded/linalg.jl

cfc39e448


      
              copyto!(dest, Ldiv(factorize(A')', B))
          end
          
          
          
          # Direct and transposed algorithms
          function ldiv!(A::BandedLU{T,<:BandedMatrix}, B::StridedVecOrMat{T}) where {T<:BlasFloat}
              m = size(A.factors,1)
              l,u = bandwidths(A.factors)
              data = bandeddata(A.factors)
              iszero(m) || LAPACK.gbtrs!('N', l, u-l, m, data, A.ipiv, B)
              B
          end
          
          function ldiv!(A::BandedLU{T}, B::AbstractVecOrMat{Complex{T}}) where T<:Real
              # c2r = reshape(transpose(reinterpret(T, reshape(B, (1, length(B))))), size(B, 1), 2*size(B, 2))
              a = real.(B)
              b = imag.(B)
              ldiv!(A, a)
              ldiv!(A, b)
              B .= a .+ im.*b

So I guess it’s LAPack that is slow?

Note both LAPack and BLAS have their own triangular solvers. It’s possible to use a custom ldiv! to use BLAS instead of LAPack in which case we are the same speed:

julia> using AppleAccelerate, BandedMatrices, SparseArrays, LinearAlgebra

julia> n = 1000; A = brand(n,n,2,2); F = lu(A); x = randn(n);

julia> function blas_ldiv!(F, x)
       permute!(x, F.p) # allocates....
       ldiv!(UnitLowerTriangular(F.factors), x)
       ldiv!(UpperTriangular(F.factors), x)
       x
       end
blas_ldiv! (generic function with 2 methods)

julia> y = copy(x); @btime blas_ldiv!(F, y);
  12.292 μs (6 allocations: 15.88 KiB)

julia> y = copy(x); @btime ldiv!(F, y);
  22.916 μs (0 allocations: 0 bytes)

julia> F_s =  lu(sparse(A));

julia> y = copy(x); @btime ldiv!(F_s, y);
  13.042 μs (3 allocations: 7.94 KiB)

It’s possible that what LAPack does to avoid allocations is making it slower?

Topic		Replies	Views
Eigensolver for interior eigenvalues of sparse (banded) Hermitian matrices? Numerics package , linearalgebra	3	836	June 14, 2020
Linear solve slower with Banded Matrix General Usage linearalgebra	8	202	September 25, 2024
Sparse solve vs BandedMatrix; time and allocation surprise Performance	29	2600	May 16, 2022
New blog post on banded matrices and ordinary differential equations using BandedMatrices.jl Numerics announcement	31	2922	February 9, 2019
Matrix type for inversion Numerics linearalgebra , sparse	2	298	December 28, 2023

Efficient representation of a sparse "pentadiagonal" matrix?

— Edit —

Related topics