Speeding up solving `Ax=b` repeated when `A` is partially updated

I need to solve for Ax=b where x and b are both vectors of length N+1. A is matrix of size (N+1,N+1).

A has a certain structure: A[1:N,N+1]=1, A[N+1,1:N]=1, A[N+1,N+1]=0, and A[1:N,1:N] = matD + Diagonal(d). matD is a matrix for carrying out differentiation via finite difference and is fixed. d is a Vector of length N. Both d and b change for each solve.

Since I need to solve for x repeated for different parameters (i.e. different d and b), I wonder if there is an efficient way for solving this problem both from a linear algebra algorithm point of view (like if there exists some special factorization for A with this specific structure) and from a Julia implementation point of view (like if there are any tricks to reduce memory allocation, etc). Right now, this is what I have:

# A_fixed is the non-mutating part of A already pre-calculated

for i=1:100  # say loop 100 times for 100 different set of parameters

    # some codes to calculate b and d

    copy!(Amat, A_fixed) # Amat is a pre-allocated matrix of size (N+1,N+1)
    for n=1:N
       Amat[n,n] += d[n]
    Alu = lu!(Amat)
    ldiv!(x, Alu, b)    # x is a pre-allocated vector of size N+1

Are there better ways to solve for x?

1 Like

There may be a better way, if the changes in Amat from one iteration to the next are “small” (in a sense made more precise below). Back in the stone age of computing (the 1990s), solving linear systems of equations was still pretty expensive even for matrices of order as small as a few hundred to several thousand. My fellow cavemen and I needed to solve many similar systems as we swept a parameter, frequency, over its analysis bandwidth. We adapted a well-known scheme called iterative refinement for this purpose. Here is the basic idea of our modified iterative refinement (MIR) algorithm.

Suppose one wishes to solve the matrix equation Ax = b and that we have already computed the inverse (in practice, we’ll solve using the LU decomposition instead of explicitly forming the inverse) of a matrix A_0 which is “nearly equal” to A. We compute a series of approximations to x as follows:

\begin{array}{ll} x^{(1)} = A_0^{-1} b, & r^{(1)} = b - A x^{(1)} \\ x^{(2)} = x^{(1)} + A_0^{-1} r^{(1)}, & r^{(2)} = b - A x^{(2)} \\ \hphantom{x^{(1)}} \vdots & \hphantom{x^{(1)}} \vdots \\ x^{(i)} = x^{(i-1)} + A_0^{-1} r^{(i-1)}, & r^{(i)} = b - A x^{(i)} \\ \hphantom{x^{(1)}} \vdots & \hphantom{x^{(1)}} \vdots \end{array}

It is not hard to show that the i th residual r^{(i)} = \left(I - AA_0^{-1} \right)^{i-1} r^{(1)}
so that r^{(i)} \rightarrow 0 if the spectral radius of
I - AA_0^{-1} is less than 1. Inverting a matrix of order m (or obtaining the LU decomposition) is an O(m^3) operation, while solving the factored system and matrix-vector multiplication are only O(m^2). Therefore, this algorithm can save execution time if convergence is rapid enough (fewer than approximately m iterations are required).

Here is pseudocode for the algorithm from our paper on the subject.

For our application, this technique worked very well, resulting in substantial savings in execution time. Of course, YMMV. Hope it helps.


I remember from my numeric linear algebra something about Sherman-Morisson-Woodbury when you have a set of rank 1 updates. Not sure if it is applicable in your case but might be worth a look.

Not applicable to diagonal updates, unfortunately, since they are full rank (unless the diagonal is sparse).

1 Like

I don’t think you will find a generally applicable method for handling diagonal updates that is better than starting from scratch. (Unless the changes to the diagonal and right-hand side are suitably small, in which case there should be iterative options, as already noted with a nice suggestion by @PeterSimon). If you are making bigger changes you might try focusing instead on finding the best algorithm for Amat without updating. It might not be a dense LU factorization. A good choice of algorithm will depend on the details of the problem, notably on A_fixed and d[n]. And how fancy you want to get with the algorithm is likely to be constrained by N; if it’s not really large, you probably don’t want to go too fancy with something with better asymptotic complexity but a lot of overhead. At the very least, it looks like you are probably using dense LU on a sparse problem.

1 Like