Linear solve slower with Banded Matrix

RobertGregg · September 25, 2024, 1:48am

I was trying to speed up a linear solve (Ax=b) where my A matrix is banded. Compared to just using a sparse matrix, the banded matrix took about twice as long.

using Random, SparseArrays, LinearAlgebra
using BenchmarkTools
using BandedMatrices

Random.seed!(123)

function myBandedMatrix(k, α::T) where T

    L = spdiagm(-1 => ones(T,k-1), 1 => ones(T,k-1))
    L[1,2] = L[end,end-1] = 2
    L -= 2I

    return I + α*L'L
end

k=1_000
α = 0.37
A = myBandedMatrix(k, α)
x = rand(k)
b = rand(k)

#Solving Ax=b
Alu = lu!(A);
AluBand = lu!(BandedMatrix(A));

@btime ldiv!($x,$Alu,$b);
@btime ldiv!($x,$AluBand,$b);

Gives the following times on my laptop:

8.800 μs (0 allocations: 0 bytes)
18.900 μs (0 allocations: 0 bytes)

Maybe ldiv! isn’t specialized for banded matrices?

gdalle · September 25, 2024, 5:28am

You can find out by doing @which ldiv!(res, A, b) in your REPL. If this returns a generic method, then there is no special banded implementation (or you’re not hitting it).

Sevi · September 25, 2024, 6:07am

Looks like the banded LU decomposition is not quite as performant as the sparse one:

julia> AluBand
BandedMatrices.BandedLU{Float64, BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}}
L factor:
1000×1000 Matrix{Float64}:
  1.0           0.0           0.0          …   0.0         0.0       0.0
 -0.273315      1.0           0.0              0.0         0.0       0.0
  0.0455525    -0.1762        1.0              0.0         0.0       0.0
 -1.73472e-18   0.0547027    -0.190059         0.0         0.0       0.0
  2.1684e-19    0.0           0.0536623        0.0         0.0       0.0
 -2.98664e-20   5.22499e-19  -1.73472e-18  …   0.0         0.0       0.0
  4.35574e-20  -1.45e-19      7.95367e-20      0.0         0.0       0.0
  ⋮                                        ⋱                         
  0.0           0.0           0.0              0.0         0.0       0.0
  0.0           0.0           0.0          …   0.0         0.0       0.0
  0.0           0.0           0.0              0.0         0.0       0.0
  0.0           0.0           0.0              1.0         0.0       0.0
  0.0           0.0           0.0             -0.206187    1.0       0.0
  0.0           0.0           0.0              0.0591741  -0.155663  1.0

julia> typeof(AluBand.L)
Matrix{Float64} (alias for Array{Float64, 2})

julia> typeof(AluBand.U)
BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}

julia> typeof(Alu.L)
SparseMatrixCSC{Float64, Int64}

julia> typeof(Alu.U)
SparseMatrixCSC{Float64, Int64}

so the L factor is still a dense matrix (R is banded as well) whereas the sparse one produces two sparse factors. My guess is that using the dense matrix creates the performance difference.

Since the sparse LU returns (sparse) banded matrices, I’m wondering why the decomposition for BandedMatrix cannot achieve the same (but I’m also not that deep into the properties of all of these matrices and their decompositions)…

RobertGregg · September 25, 2024, 6:23am

It looks like ldiv! goes to the same internal function, however when using the backslash operator the banded matrix goes to ArrayLayouts.jl which leads me to think I’m not hitting the specialized method.

ABand = BandedMatrix(A)
@btime $ABand \ $b; #87.300 μs (7 allocations: 86.55 KiB)
@btime $A \ $b;     #773.800 μs (80 allocations: 1.24 MiB)

jishnub · September 25, 2024, 6:29am

Please file an issue with BandedMatrices.jl

RobertGregg · September 25, 2024, 6:32am

Hmm this is very likely part of the problem because the L matrix seems to be banded. Good catch!

Sevi · September 25, 2024, 6:47am

This seems related?

jishnub · September 25, 2024, 8:50am

Note that the L matrix will be dense (although it should be triangular), as the pivoted LU decomposition becomes LU = PB, and the right-hand side isn’t banded in general. However, we don’t actually need the L factor in the ldiv!, and use the banded internal representation of the factorization instead that stores the elements of L.

github.com

JuliaLinearAlgebra/BandedMatrices.jl/blob/30ecd64dfa2d97afed7e357be4b742a51d1a4a04/src/banded/linalg.jl#L24-L30


      
          function ldiv!(A::BandedLU{T,<:BandedMatrix}, B::StridedVecOrMat{T}) where {T<:BlasFloat}
              m = size(A.factors,1)
              l,u = bandwidths(A.factors)
              data = bandeddata(A.factors)
              iszero(m) || LAPACK.gbtrs!('N', l, u-l, m, data, A.ipiv, B)
              B
          end

Unfortunately, this still doesn’t seem as performant as sparse solvers, but the performance may be improved to some extent by using MKL instead of OpenBLAS if using intel processors.

Sevi · September 25, 2024, 12:08pm

I see, thanks for the clarification! I removed the solution tag from my answer.

Could the problem still just be that there seem to be more elements stored in L when computing lu with the BandedMatrix compared to the SparseMatrixCSC?

Topic		Replies	Views
Matrix type for inversion Numerics linearalgebra , sparse	2	274	December 28, 2023
Sparse solve vs BandedMatrix; time and allocation surprise Performance	29	2437	May 16, 2022
Speed of ldiv! for banded matrices in BandedMatrices.jl Performance	2	118	February 9, 2025
Increasing the solution speed of sparse linear system General Usage	14	507	May 19, 2024
Help on how to describe a Matrix used for 2D Finite Differences using BlockBandedMatrices.jl Modelling & Simulations matrices , blockdiagonals	9	381	August 25, 2023

Linear solve slower with Banded Matrix

Related topics