Non-allocating matrix inversion

danielwe · April 23, 2022, 9:56pm

Perhaps it’s worth adding to this thread that if your matrix is symmetric/hermitian positive definite, you can compute an in-place inverse with zero (pre-)allocations through a Cholesky decomposition. The undocumented LinearAlgebra.inv!(cholesky!(A)) gets you most of the way there, but to get every last ounce of performance you need to use the LAPACK API as shown below (in both cases you get a substantial speedup compared to the naive out-of-place inv(A) just from the fact that you’re using a Cholesky decomposition).

Here’s how you do this through the LAPACK API. Note that this also works for CuArrays because CUDA.jl adds CUSOLVER-wrapping methods to LAPACK.potr{f,s,i}!.

using LinearAlgebra
using LinearAlgebra: BlasReal, BlasFloat

function inv!(A::Union{Symmetric{<:BlasReal},Hermitian{<:BlasFloat}})
    _, info = LAPACK.potrf!(A.uplo, A.data)
    (info == 0) || throw(PosDefException(info))
    LAPACK.potri!(A.uplo, A.data)
    return A
end

Note that this function relies on A being <: Union{Symmetric,Hermitian} such that only one triangle of A.data, indicated by A.uplo, is considered data both for input and output. The LAPACK functions will use the other triangle as workspace and leave garbage there upon returning. If you need A.data to contain a valid inverse at the end you need to copy data over from the active triangle. This is what LinearAlgebra.inv!(cholesky!(A)) does for you and is the reason it takes a little more time.

Apart from this, LinearAlgebra.inv!(cholesky!(A)) and inv!(A) are basically equivalent; they make the same LAPACK calls under the hood, and while cholesky! has a type instability and hence a dynamic lookup and two tiny allocations, the performance hit from this is entirely insignificant (about 0.4 μs when I isolate the difference in benchmarks, compared to >100 μs for computing a 100x100 inverse on my laptop).

Benchmarks at 100x100:

julia> using BenchmarkTools

julia> function randposdef(N)
           A = randn(N, N)
           return Symmetric(A * A' + I)
       end
randposdef (generic function with 1 method)

julia> @btime inv(A) setup=(A = randposdef(100)) evals=1;
  177.223 μs (5 allocations: 129.17 KiB)

julia> @btime inv(cholesky!(A)) setup=(A = randposdef(100)) evals=1;
  128.572 μs (4 allocations: 78.22 KiB)

julia> @btime LinearAlgebra.inv!(cholesky!(A)) setup=(A = randposdef(100)) evals=1;
  121.586 μs (2 allocations: 48 bytes)

julia> @btime inv!(A) setup=(A = randposdef(100)) evals=1;
  114.315 μs (0 allocations: 0 bytes)

Topic		Replies	Views
How do I lessen memory allocation and speedup block inverse? Performance	2	483	March 20, 2020
Help to reduce number of allocations General Usage	23	5486	January 6, 2019
Computing Inverse of a stack of matrices General Usage performance , linearalgebra	19	3569	August 29, 2020
Reducing allocations when subtracting vectors and multiplying by scalars Performance memory-allocation , optimization	9	453	November 9, 2021
Linear algebra performance issue General Usage	20	1086	July 10, 2020

Non-allocating matrix inversion

Related topics