Speed of ldiv! for banded matrices in BandedMatrices.jl

mstewart · February 7, 2025, 4:53pm

BandMatrices.jl is innocent, I think. Profiling it, nearly all the time is spent in the LAPACK routines dgbtrs and sgbtrs. So I think this is just the way LAPACK performs on these problems. If you run it with substantially larger bandwidths (e.g. 100 instead of 10), I think you will find that single precision starts to pull ahead. These routines are not using blocking, so your timing is going to be dominated by memory access.

The code from dgbtrs that does forward substitution using L is

         IF( LNOTI ) THEN
            DO 10 J = 1, N - 1
               LM = MIN( KL, N-J )
               L = IPIV( J )
               IF( L.NE.J )
     $            CALL DSWAP( NRHS, B( L, 1 ), LDB, B( J, 1 ), LDB )
               CALL DGER( LM, NRHS, -ONE, AB( KD+1, J ), 1, B( J,
     $                    1 ),
     $                    LDB, B( J+1, 1 ), LDB )
   10       CONTINUE

The code for single precision is similar. The routine DSWAP exchanges two rows, traversing those rows when the matrix is stored in column major order, which is slow. Note that the right-hand side is B and the LU factorization is in band storage in AB.

Edit: I went out on a limb and I had a guess on what was going on that on reflection doesn’t seem very plausible to me with a single right-hand side. So I’ll speculate a bit less and just say that this is a small matrix in the sense that the band storage for the LU factorization has 31 rows. There may be something interesting going on with memory access, but LAPACK isn’t known for being fast on small matrices and there could also be some fixed overhead that is independent of the precision. The fact that single precision is slower than you expected relative to double precision on such a small bandwidth is not very surprising. It would be interesting to try to profile the Fortran code.

Topic		Replies	Views
Linear solve slower with Banded Matrix General Usage linearalgebra	8	139	September 25, 2024
The best linear solver for sparse bandedblockbandedmatricies Performance diffeq , linearalgebra , sparse	30	5548	May 24, 2020
Solvers for Block-Tridiagonal Numerics linearalgebra	14	231	June 24, 2025
Linear least-squares solution occasionally much slower than what is achievable using LAPACK based wrappers General Usage	0	435	July 19, 2020
Increasing the solution speed of sparse linear system General Usage	14	501	May 19, 2024

Speed of ldiv! for banded matrices in BandedMatrices.jl

Related topics