How to do in-place least square `\`?

yhchang96 · July 3, 2023, 3:02pm

Suppose I want to solve the least square problem, I can just do x = A \ b where A is a ‘tall’ matrix. Is there an in-place version for this? If I’m not mistaken it just performs a QR factorization behind the scene to solve for x, but I’m not too sure how to call those functions optimally to have the least memory allocation. I don’t mind changing the entries of A, and I can supply some cache matrices to reduce allocation.

Oscar_Smith · July 3, 2023, 3:10pm

you probably should be using LinearSolve.jl for this.

gdalle · July 3, 2023, 3:15pm

Among the iterative solver backends offered by LinearSolve.jl, I know Krylov.jl has an in-place version, not sure about the others

yhchang96 · July 3, 2023, 3:23pm

The typical size for A in my case is only size(A)=(128,4). I’m not sure if this is considered ‘large’ enough to use iterative solvers. I just need to solve A \ b many times as part of an optimization problem where each time the entries of A change, and right now I’m getting something like 98% time spent on gc, so I suspect using some sort of in-place version of A \ b might solve the memory allocation problem.

nsajko · July 3, 2023, 3:52pm

Have you tried StaticArrays? I think this may be small enough. I don’t know whether they implement QR decomposition, but try it if you have not yet.

mikmoore · July 3, 2023, 4:09pm

Unfortunately, solving A \ b for many different A mostly requires refactoring A every time, which is the computationally expensive part. The exception is when the factorization can be efficiently updated, for example low-rank updates to a Cholesky factorization.

But you can still save some memory allocations. Since you say GC time is significant, the savings can be significant too. You want to use some in-place methods like the following:

using LinearAlgebra, Random

# initialize
b = Vector{Float64}(undef,128) # initialize memory for b
A = Matrix{Float64}(undef,(4,128)) # initialize memory for A

# loop
for <something>
  # compute input vector and matrix
  randn!(b) # overwrite b with randn values
  randn!(A) # overwrite A with randn values
  
  # solve A \ b
  Aqr = qr!(A, ColumnNorm()) # factorize A, with permission to overwrite the memory of A to reduce memory allocations
  ldiv!(Aqr, b) # overwrite b with A \ b
end

The ! at the end of these function names signifies that they are permitted/intended to mangle one or more of the inputs.

Replace my randn! calls with your actual computations of A and b. Ideally, compute these in recycled memory to avoid additional allocations. Look into other LinearAlgebra mutating functions (like mul!) to help with that part, and also in-place broadcasting with .= assignments.

A \ b does the same steps of factorizing A and then solving the system with the factorization. But here we take some manual control so that we allocate less in the process. The qr! allows us to mangle the input values in A so we allocate less memory for the factorization. Do not expect A to hold its original values afterward. Changing A will change Aqr, making it incorrect, so don’t change A until you’re done using Aqr. The ldiv! allows us to overwrite b with the result to save an additional memory allocation. b will have its values changed, but Aqr can still be used for additional solves if you have more vectors to solve against.

Topic		Replies	Views
Solving a matrix least-squares with LinearSolve.jl Numerics	2	170	February 28, 2025
How to optimise a linear solve in a hot loop? Performance linearalgebra , linearsolve	7	170	October 3, 2024
Using multiple in-place krylov solver from Krylov.jl General Usage linearalgebra , iterative-solvers	5	362	April 15, 2024
Cholmod In-place Solvers General Usage	4	842	December 7, 2016
How to solve this Ax=b faster? Numerics linearalgebra , linearsolve	24	2547	June 7, 2022

How to do in-place least square `\`?

Related topics