I wanted to see how important removing allocations were when repeatedly solving linear systems (Ax=b), so I built a quick test package to try things out: GitHub - RobertGregg/QRLite.jl · GitHub. The interface uses FastLapackInterface.jl to store work-spaces for the LAPACK calls and the rest is just making sure I arrive at the same solution as the backslash operator (e.g., using pivoted QR). Below is an example of solving a system were A and b change at the same time:
using QRLite, BenchmarkTools
n = 50
k = 5
#A is not square to avoid \ from using LU instead of QR
A = rand(2n,n)
B = rand(2n,k)
linsolve = CachedQR(A,B)
x = solution(linsolve)
x ≈ A \ B #true
Anew = rand(2n,n)
Bnew = rand(2n,k)
#Change A and B at the same time
updateQR!(linsolve, Anew, Bnew)
solution(linsolve) ≈ Anew \ Bnew #true
#Check for allocations
@benchmark updateQR!($linsolve, $Anew, $Bnew)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 49.300 μs … 541.400 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 53.300 μs ┊ GC (median): 0.00%
Time (mean ± σ): 54.192 μs ± 10.592 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▃ █
██▅▄▇█▇█▆█▄▇█▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▁▂▂▂▁▂▂▂▂▂ ▃
49.3 μs Histogram: frequency by time 83.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
Everything seems to be working. However, what I found surprising was what happened with the benchmark comparisons:
Avoiding allocations showed a decent speedup for small systems (e.g., A size = 20 by 10), but once you get past a few hundred rows and columns, the timings are nearly identical. This was also the case when more columns were added to b. From this, I would conclude avoiding allocations is really only important if you’re solving a huge number of small systems.
If people think this kind of package would be helpful I could polish it up and register it. ![]()
