Linear solve slower with Banded Matrix

I was trying to speed up a linear solve (Ax=b) where my A matrix is banded. Compared to just using a sparse matrix, the banded matrix took about twice as long.

using Random, SparseArrays, LinearAlgebra
using BenchmarkTools
using BandedMatrices

Random.seed!(123)

function myBandedMatrix(k, α::T) where T

    L = spdiagm(-1 => ones(T,k-1), 1 => ones(T,k-1))
    L[1,2] = L[end,end-1] = 2
    L -= 2I

    return I + α*L'L
end

k=1_000
α = 0.37
A = myBandedMatrix(k, α)
x = rand(k)
b = rand(k)

#Solving Ax=b
Alu = lu!(A);
AluBand = lu!(BandedMatrix(A));

@btime ldiv!($x,$Alu,$b);
@btime ldiv!($x,$AluBand,$b);

Gives the following times on my laptop:

8.800 μs (0 allocations: 0 bytes)
18.900 μs (0 allocations: 0 bytes)

Maybe ldiv! isn’t specialized for banded matrices?

You can find out by doing @which ldiv!(res, A, b) in your REPL. If this returns a generic method, then there is no special banded implementation (or you’re not hitting it).

Looks like the banded LU decomposition is not quite as performant as the sparse one:

julia> AluBand
BandedMatrices.BandedLU{Float64, BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}}
L factor:
1000×1000 Matrix{Float64}:
  1.0           0.0           0.0          …   0.0         0.0       0.0
 -0.273315      1.0           0.0              0.0         0.0       0.0
  0.0455525    -0.1762        1.0              0.0         0.0       0.0
 -1.73472e-18   0.0547027    -0.190059         0.0         0.0       0.0
  2.1684e-19    0.0           0.0536623        0.0         0.0       0.0
 -2.98664e-20   5.22499e-19  -1.73472e-18  …   0.0         0.0       0.0
  4.35574e-20  -1.45e-19      7.95367e-20      0.0         0.0       0.0
  ⋮                                        ⋱                         
  0.0           0.0           0.0              0.0         0.0       0.0
  0.0           0.0           0.0          …   0.0         0.0       0.0
  0.0           0.0           0.0              0.0         0.0       0.0
  0.0           0.0           0.0              1.0         0.0       0.0
  0.0           0.0           0.0             -0.206187    1.0       0.0
  0.0           0.0           0.0              0.0591741  -0.155663  1.0

julia> typeof(AluBand.L)
Matrix{Float64} (alias for Array{Float64, 2})

julia> typeof(AluBand.U)
BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}

julia> typeof(Alu.L)
SparseMatrixCSC{Float64, Int64}

julia> typeof(Alu.U)
SparseMatrixCSC{Float64, Int64}

so the L factor is still a dense matrix (R is banded as well) whereas the sparse one produces two sparse factors. My guess is that using the dense matrix creates the performance difference.

Since the sparse LU returns (sparse) banded matrices, I’m wondering why the decomposition for BandedMatrix cannot achieve the same (but I’m also not that deep into the properties of all of these matrices and their decompositions)…

It looks like ldiv! goes to the same internal function, however when using the backslash operator the banded matrix goes to ArrayLayouts.jl which leads me to think I’m not hitting the specialized method.

ABand = BandedMatrix(A)
@btime $ABand \ $b; #87.300 μs (7 allocations: 86.55 KiB)
@btime $A \ $b;     #773.800 μs (80 allocations: 1.24 MiB)

Please file an issue with BandedMatrices.jl

1 Like

Hmm this is very likely part of the problem because the L matrix seems to be banded. Good catch!

This seems related?

Note that the L matrix will be dense (although it should be triangular), as the pivoted LU decomposition becomes LU = PB, and the right-hand side isn’t banded in general. However, we don’t actually need the L factor in the ldiv!, and use the banded internal representation of the factorization instead that stores the elements of L.

Unfortunately, this still doesn’t seem as performant as sparse solvers, but the performance may be improved to some extent by using MKL instead of OpenBLAS if using intel processors.

I see, thanks for the clarification! I removed the solution tag from my answer.

Could the problem still just be that there seem to be more elements stored in L when computing lu with the BandedMatrix compared to the SparseMatrixCSC?