LU factorisation of a block tridiagonal square matrix

Following up on Fastest way to perform A * B * A’

Given a block tridiagonal matrix X with size NxN, having blocks of size MxM (M<N and N%M=0), I define A = inv(I-X). I also define B having the same size as A, and it is also block diagonal but with a block size 2.

Now, the fastest way to extract only a few elements from the expression R=ABA’, involves the LU factorisation of X. I was hoping to exploit the block tridiagonal structure of X to further speed up the process. Are there any built-in routines to exploit this?


using LinearAlgebra
using MKL
using BenchmarkTools

N = 10000;
M = 500;

##---For extracting the "few elements" from R---
m = 100;
aa = zeros(Int32, m); bb = zeros(Int32, m);
for ij = 1:m
    aa[ij] = rand(1:N);
    bb[ij] = rand(1:N);
end
    

##---Forming matrices X and B---
NbM = trunc(Int, N/M);

Matdiag = rand(Float64, M, M);
Matoffdiag = rand(Float64, M, M);

Id = Diagonal(ones(NbM));
tridiagu = Tridiagonal(ones(Float64, NbM-1), zeros(Float64, NbM), zeros(Float64, NbM-1));
tridiagd = Tridiagonal(zeros(Float64, NbM-1), zeros(Float64, NbM), ones(Float64, NbM-1));

X = kron(Id, Matdiag) + kron(tridiagu, Matoffdiag) + kron(tridiagd, Matoffdiag);
B=kron(I(trunc(Int, N/2)), [1 2; 2 1]);

##---Good LU based routine (w/o block tridiagonal exploitation)---
function LUinv(X,B,a,b)
    LU = lu(I - X');

    Aa = zeros(eltype(LU), size(X,1), length(a));
    for i = 1:length(a)
        Aa[a[i], i] = 1;
    end
    ldiv!(LU, Aa);

    Ab = zeros(eltype(LU), size(X,1), length(b));
    for i = 1:length(b)
        Ab[b[i], i] = 1;
    end
    ldiv!(LU, Ab);

    Rab = length(a) < length(b) ? (Aa' * B) * Ab : Aa' * (B * Ab);

    return Rab
end

##---Basic matrix inversion---
function basicinv(X,B,a,b)
    G = inv(I-X);
    Ga = G';

    Rab = G[a,:]*B*Ga[:,b];

    return Rab
end

##---Comparison---
LUinv(X,B,aa,bb) ≈ basicinv(X,B,aa,bb)

##---Benchmarking---
@btime LUinv($X,$B,$aa,$bb);
@btime basicinv($X,$B,$aa,$bb);

I don’t know that we have a nice “recursive” LU so that could exploit a tridiagonal matrix-of-matrices, but such a routine would obviously be a nice way to do this. You could write your own (LU isn’t that hard, and you could still use BLAS for almost all of it – and the tridiagonal case shouldn’t require any pivoting), but that’s still a bit of work.

That said, I’ve tried to exploit matrix structures of this flavor in the past and I was actually very impressed by the performance of “unstructured” sparse arrays for this purpose. I found that they got surprisingly close to the performance of painstakingly written specialized solvers. Your MWE appears to be 5% dense which is not quite as sparse as one might wish but still small enough to be worth a try.

If speed is important and precision is not, you might also see some speedup from dropping to Float32.

1 Like

Maybe a julia interface to this
https://www.intel.com/content/www/us/en/docs/onemkl/cookbook/2021-4/slv-sys-lin-eq-lu-factor-blk-tridiag-coeff-mat.html

It could be easier to just rewrite this code in Julia — at first glance, it looks like this code may be just a block version of the Thomas algorithm. You may run into numerical instabilities if the matrix is not SPD or similarly well-behaved, though; I’m not sure.

1 Like