I was trying to speed up a linear solve (Ax=b) where my A matrix is banded. Compared to just using a sparse matrix, the banded matrix took about twice as long.
using Random, SparseArrays, LinearAlgebra
using BenchmarkTools
using BandedMatrices
Random.seed!(123)
function myBandedMatrix(k, α::T) where T
L = spdiagm(-1 => ones(T,k-1), 1 => ones(T,k-1))
L[1,2] = L[end,end-1] = 2
L -= 2I
return I + α*L'L
end
k=1_000
α = 0.37
A = myBandedMatrix(k, α)
x = rand(k)
b = rand(k)
#Solving Ax=b
Alu = lu!(A);
AluBand = lu!(BandedMatrix(A));
@btime ldiv!($x,$Alu,$b);
@btime ldiv!($x,$AluBand,$b);
You can find out by doing @which ldiv!(res, A, b) in your REPL. If this returns a generic method, then there is no special banded implementation (or you’re not hitting it).
so the L factor is still a dense matrix (R is banded as well) whereas the sparse one produces two sparse factors. My guess is that using the dense matrix creates the performance difference.
Since the sparse LU returns (sparse) banded matrices, I’m wondering why the decomposition for BandedMatrix cannot achieve the same (but I’m also not that deep into the properties of all of these matrices and their decompositions)…
It looks like ldiv! goes to the same internal function, however when using the backslash operator the banded matrix goes to ArrayLayouts.jl which leads me to think I’m not hitting the specialized method.
Note that the L matrix will be dense (although it should be triangular), as the pivoted LU decomposition becomes LU = PB, and the right-hand side isn’t banded in general. However, we don’t actually need the L factor in the ldiv!, and use the banded internal representation of the factorization instead that stores the elements of L.
Unfortunately, this still doesn’t seem as performant as sparse solvers, but the performance may be improved to some extent by using MKL instead of OpenBLAS if using intel processors.
I see, thanks for the clarification! I removed the solution tag from my answer.
Could the problem still just be that there seem to be more elements stored in L when computing lu with the BandedMatrix compared to the SparseMatrixCSC?