Why re-use factorization(A) rather than inv(A)

photor · April 19, 2024, 7:16am

What is the advantage of first decomposing and then left dividing every time over inverting once and multiplying the inverse every time?

lhcgeneva · April 19, 2024, 8:03am

Can’t explain to you why, but at least if I do this things slow down by twofold. I’d also be interested in why. From my point of view at least it shouldn’t be slower?

stevengj · April 19, 2024, 12:03pm

For a dense m \times m matrix, using the factorization to left-divide every time involves:

Computing the factorization (typically LU) A = LU (technically PA = LU including pivots, but let me leave that out for simplicity). This has \Theta(m^3) cost, but is only done once for all right-hand-sides. F = lu(A) stores L, U, and P implicitly in F.
Using the factorization to solve Ax = LUx = b for m for each right-hand side, which has \Theta(m^2) cost. In particular, olving LUx = b \Longleftrightarrow x = U^{-1} (L^{-1} b) involves \Theta(m^2) two triangular solves: c = L^{-1} b \Longleftrightarrow Lc = b is computed by forward substitution (top to bottom) and x = U^{-1}c \Longleftrightarrow Ux = b is computed by backsubstitution (bottom to top). x = F \ b does this.

Note that using the factorization to solve Ax = b has \Theta(m^2) cost similar to a matrix–vector multiply, and indeed computing F \ b is roughly comparable to the cost of multiplying x = A^{-1} b if you had precomputed the matrix inverse (though the constant factors for the latter are a bit better).

Inversion of a dense m \times m matrix involves two steps:

Computing the factorization (typically LU) A = LU (technically PA = LU including pivots, but let me leave that out for simplicity). This has \Theta(m^3) cost.
Using the factorization to solve Ax = LUx = b for m right-hand sides, corresponding to AX = I. Each right hand side (each column of I) has \Theta(m^2) cost, so all of these solves together have \Theta(m^3) cost.

(You may have learned the Gauss–Jordan algorithm for A^{-1} in school. This is equivalent to those two steps: Gaussian elimination on A is equivalent to LU factorization, applying the same elimination steps to I is equivalent to forward-substitution C = L^{-1} I, and then the “upwards” elimination is equivalent to backsubsitution U^{-1} C.) Given the inverse,

Applying A^{-1} to a vector is \Theta(m^2) cost, comparable to a triangular solve (though the constant factors are a bit better).

Thus, matrix inversion is equivalent to solving m right-hand sides, so it is definitely not worth it if you are solving \ll m right-hand-sides, e.g. you are doing just a couple Newton steps. It might be worth it if you are solving \gg m right-hand sides, because the constant factors for a matrix–vector multiply are better than for two triangular solves, but this seems to be a relatively rare situation.

In general, you should get in the habit of never computing matrix inverses. Read A^{-1} b as “solve Ax=b by the best-available method.”

PS. Computing inverses is much, much worse if A is a sparse matrix, because in that case L and U are sparse but A^{-1} is typically dense.

PPS. Sometimes, people argue that solving Ax = b by the matrix inversion is much less accurate because it involves additional steps. Druinsky and Toledo (2012) argue that this is not typically a major concern, however.

photor · April 19, 2024, 12:17pm

Note that the matrix inversion needs to be done only once because the matrix A never changes. In the following you only need to do matrix-vector multiplication, which has a better prefactor than doing left division every time, as you said.
For the sparse case, I agree that doing the decomposition is much better.

stevengj · April 19, 2024, 12:25pm

The constant factor of matrix–vector is only slightly better than left-division by a factorization, so (as I explained above) you have to be solving a lot of right-hand sides (proportional to the size of the matrix) before computing the matrix inversion is worth it.

For example, with a 1000 \times 1000 matrix:

julia> using BenchmarkTools, LinearAlgebra

julia> LinearAlgebra.BLAS.set_num_threads(1) # single-threaded benchmarking is simpler

julia> A = rand(1000,1000); b = rand(1000); x = similar(b);

julia> F = @btime lu($A);
  18.455 ms (6 allocations: 7.64 MiB)

julia> A⁻¹ = @btime inv($A);
  54.006 ms (10 allocations: 8.13 MiB)

julia> @btime ldiv!($x, $F, $b);
  203.052 μs (0 allocations: 0 bytes)

julia> @btime mul!($x, $A⁻¹, $b);
  159.604 μs (0 allocations: 0 bytes)

So, on my machine, you’d have to be solving around 800 right-hand sides (one by one) for the cost of a 1000x1000 matrix inversion to break even.

If you are solving lots of right hand sides at once, i.e. you are computing X = A^{-1} B where B is a matrix, then you need a lot more to hit the break-even point. On my machine the break-even comes at about 2700 simultaneous right-hand sides for a 1000x1000 matrix:

julia> n = 2700; B = rand(1000,n); X = similar(B);

julia> @btime ldiv!($B, $F, $B);
  131.404 ms (4 allocations: 20.60 MiB)

julia> @btime mul!($X, $A⁻¹, $B);
  95.540 ms (0 allocations: 0 bytes)

(so that the performance loss of ldiv! vs mul! cancels the performance loss of inv vs lu. Together, they both take about 150ms in total.)

PS. Obviously, if the matrix is tiny, e.g. 2x2 or 3x3, then the constant factors are completely different, especially if you are using StaticArrays.jl.

photor · April 19, 2024, 2:24pm

For numerical simulations (with dense matrices), the multiplication or left division will be done nearly infinite times, so the inversion is definitely worth doing. But I haven’t tested the possible difference of accuracy.

stevengj · April 19, 2024, 9:50pm

If you are timestepping with a constant dense matrix inverse, then it sounds like you have a linear time-invariant system and you should probably consider some other method (e.g. exponential integrators). Even if it is a split-step scheme (where only part of the timestep is represented by a constant matrix) there are probably better methods for most small dense problems.

Again, I agree that in cases where you have a dense m \times m matrix A and you want to multiply by A^{-1} \gg m times, it is often worthwhile to compute the inverse explicitly. But I think that such cases are pretty rare … 95% of the time, if you are computing an explicit matrix inverse, you are doing something suboptimal.

maphdze · September 30, 2024, 5:45am

True. For large scale sparse linear system, calculating the inverse matrix takes much more time than factorizing the sparse matrix, and it takes a lot of memory space to store the dense inverse matrix and a lot of time to conduct the dense matrix-vector multiplication (compare to the sparse matrix).

Usually some paper in explicit time integration methods claimed that they store the inverse of stiffness matrix at the beginning, then in each iteration there will be only the matrix-vector multiplication. The case is true in their 3-degrees of freedom examples but for large scale systems storing the factorization of the matrix is much more efficient.

Topic		Replies	Views
Should `inv` be lazy? Internals & Design linearalgebra , base , lazy-evaluation	24	2070	July 23, 2020
Explicit Inversion of a random matrix performs the same as Symmetric & SPD Performance question , linearalgebra	6	1069	October 25, 2018
CUDA Matrix inverse GPU cuda , linearalgebra	13	6295	December 7, 2021
Improving Performance of For-Loop and Matrix Multiplication for Time Integration Solver Performance loops , matrices	8	1142	June 1, 2020
Alternatives to inv() in Julia Numerics question , performance	4	318	July 3, 2024

Why re-use factorization(A) rather than inv(A)

Related topics