What is the advantage of first decomposing and then left dividing every time over inverting once and multiplying the inverse every time?
Canât explain to you why, but at least if I do this things slow down by twofold. Iâd also be interested in why. From my point of view at least it shouldnât be slower?
For a dense m \times m matrix, using the factorization to left-divide every time involves:
- Computing the factorization (typically LU) A = LU (technically PA = LU including pivots, but let me leave that out for simplicity). This has \Theta(m^3) cost, but is only done once for all right-hand-sides.
F = lu(A)
storesL
,U
, andP
implicitly inF
. - Using the factorization to solve Ax = LUx = b for m for each right-hand side, which has \Theta(m^2) cost. In particular, olving LUx = b \Longleftrightarrow x = U^{-1} (L^{-1} b) involves \Theta(m^2) two triangular solves: c = L^{-1} b \Longleftrightarrow Lc = b is computed by forward substitution (top to bottom) and x = U^{-1}c \Longleftrightarrow Ux = b is computed by backsubstitution (bottom to top).
x = F \ b
does this.
Note that using the factorization to solve Ax = b has \Theta(m^2) cost similar to a matrixâvector multiply, and indeed computing F \ b
is roughly comparable to the cost of multiplying x = A^{-1} b if you had precomputed the matrix inverse (though the constant factors for the latter are a bit better).
Inversion of a dense m \times m matrix involves two steps:
- Computing the factorization (typically LU) A = LU (technically PA = LU including pivots, but let me leave that out for simplicity). This has \Theta(m^3) cost.
- Using the factorization to solve Ax = LUx = b for m right-hand sides, corresponding to AX = I. Each right hand side (each column of I) has \Theta(m^2) cost, so all of these solves together have \Theta(m^3) cost.
(You may have learned the GaussâJordan algorithm for A^{-1} in school. This is equivalent to those two steps: Gaussian elimination on A is equivalent to LU factorization, applying the same elimination steps to I is equivalent to forward-substitution C = L^{-1} I, and then the âupwardsâ elimination is equivalent to backsubsitution U^{-1} C.) Given the inverse,
- Applying A^{-1} to a vector is \Theta(m^2) cost, comparable to a triangular solve (though the constant factors are a bit better).
Thus, matrix inversion is equivalent to solving m right-hand sides, so it is definitely not worth it if you are solving \ll m right-hand-sides, e.g. you are doing just a couple Newton steps. It might be worth it if you are solving \gg m right-hand sides, because the constant factors for a matrixâvector multiply are better than for two triangular solves, but this seems to be a relatively rare situation.
In general, you should get in the habit of never computing matrix inverses. Read A^{-1} b as âsolve Ax=b by the best-available method.â
PS. Computing inverses is much, much worse if A is a sparse matrix, because in that case L and U are sparse but A^{-1} is typically dense.
PPS. Sometimes, people argue that solving Ax = b by the matrix inversion is much less accurate because it involves additional steps. Druinsky and Toledo (2012) argue that this is not typically a major concern, however.
Note that the matrix inversion needs to be done only once because the matrix A never changes. In the following you only need to do matrix-vector multiplication, which has a better prefactor than doing left division every time, as you said.
For the sparse case, I agree that doing the decomposition is much better.
The constant factor of matrixâvector is only slightly better than left-division by a factorization, so (as I explained above) you have to be solving a lot of right-hand sides (proportional to the size of the matrix) before computing the matrix inversion is worth it.
For example, with a 1000 \times 1000 matrix:
julia> using BenchmarkTools, LinearAlgebra
julia> LinearAlgebra.BLAS.set_num_threads(1) # single-threaded benchmarking is simpler
julia> A = rand(1000,1000); b = rand(1000); x = similar(b);
julia> F = @btime lu($A);
18.455 ms (6 allocations: 7.64 MiB)
julia> Aâ»Âč = @btime inv($A);
54.006 ms (10 allocations: 8.13 MiB)
julia> @btime ldiv!($x, $F, $b);
203.052 ÎŒs (0 allocations: 0 bytes)
julia> @btime mul!($x, $Aâ»Âč, $b);
159.604 ÎŒs (0 allocations: 0 bytes)
So, on my machine, youâd have to be solving around 800 right-hand sides (one by one) for the cost of a 1000x1000 matrix inversion to break even.
If you are solving lots of right hand sides at once, i.e. you are computing X = A^{-1} B where B is a matrix, then you need a lot more to hit the break-even point. On my machine the break-even comes at about 2700 simultaneous right-hand sides for a 1000x1000 matrix:
julia> n = 2700; B = rand(1000,n); X = similar(B);
julia> @btime ldiv!($B, $F, $B);
131.404 ms (4 allocations: 20.60 MiB)
julia> @btime mul!($X, $Aâ»Âč, $B);
95.540 ms (0 allocations: 0 bytes)
(so that the performance loss of ldiv!
vs mul!
cancels the performance loss of inv
vs lu
. Together, they both take about 150ms in total.)
PS. Obviously, if the matrix is tiny, e.g. 2x2 or 3x3, then the constant factors are completely different, especially if you are using StaticArrays.jl.
For numerical simulations (with dense matrices), the multiplication or left division will be done nearly infinite times, so the inversion is definitely worth doing. But I havenât tested the possible difference of accuracy.
If you are timestepping with a constant dense matrix inverse, then it sounds like you have a linear time-invariant system and you should probably consider some other method (e.g. exponential integrators). Even if it is a split-step scheme (where only part of the timestep is represented by a constant matrix) there are probably better methods for most small dense problems.
Again, I agree that in cases where you have a dense m \times m matrix A and you want to multiply by A^{-1} \gg m times, it is often worthwhile to compute the inverse explicitly. But I think that such cases are pretty rare ⊠95% of the time, if you are computing an explicit matrix inverse, you are doing something suboptimal.
True. For large scale sparse linear system, calculating the inverse matrix takes much more time than factorizing the sparse matrix, and it takes a lot of memory space to store the dense inverse matrix and a lot of time to conduct the dense matrix-vector multiplication (compare to the sparse matrix).
Usually some paper in explicit time integration methods claimed that they store the inverse of stiffness matrix at the beginning, then in each iteration there will be only the matrix-vector multiplication. The case is true in their 3-degrees of freedom examples but for large scale systems storing the factorization of the matrix is much more efficient.