I would also note that mutating gmres!
allocates memory even without using LinearMaps
.
julia> using IterativeSolvers, BenchmarkTools
julia> function f!(x, A, b, n_runs=1)
for i in 1:n_runs
x .= 0
gmres!(x, A, b)
end
return nothing
end
f! (generic function with 2 methods)
julia> x = zeros(2)
2-element Vector{Float64}:
0.0
0.0
julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
1.0 2.0
3.0 4.0
julia> b = [5.0, 6.0]
2-element Vector{Float64}:
5.0
6.0
julia> f!(x, A, b, 1) # Run once to compile
julia> @btime f!(x, A, b, 1)
1.393 μs (15 allocations: 1.23 KiB)
julia> @btime f!(x, A, b, 10)
13.151 μs (150 allocations: 12.34 KiB)
julia> @btime f!(x, A, b, 100)
131.335 μs (1500 allocations: 123.44 KiB)
julia> @btime gmres(A, b)
1.374 μs (16 allocations: 1.31 KiB)
As you can see, each call to gmres!
makes 15 allocations even for the simplest possible case of ordinary matrices and vectors. gmres
makes 16 allocations, one more to allocate the return vector. But most of the allocations appear to be used internally by the algorithm.
It would be nice if we had an option to provide a ‘cache’ variable, so that gmres!
didn’t have to allocate memory each time it ran.
The good news is that (at least for this small example), the time taken to allocate memory doesn’t tank performance.
julia> function allocate_memory(n_runs; alloc_size=1)
for i in 1:n_runs
a = Vector{Int64}(undef, alloc_size)
a[1] += 1 # If we don't do some operation on a, the compiler won't allocate a in the first place
end
end
allocate_memory (generic function with 2 methods)
julia> @btime allocate_memory(15, alloc_size=1)
258.078 ns (15 allocations: 960 bytes)
julia> @btime allocate_memory(15, alloc_size=1000)
1.601 μs (15 allocations: 119.06 KiB)
julia> @btime f!(x, A, b, 1) # Run once to compile
1.396 μs (15 allocations: 1.23 KiB)
julia> @btime allocate_memory(15, alloc_size=2)
263.934 ns (15 allocations: 1.17 KiB)
It looks like alloc_size=2
allocates the amount of memory closest to that of gmres!
(probably allocates 15 vectors the size of the system, or something like that). So I would estimate that in this example 0.263 microseconds of the 1.393 microsecond runtime is spent allocating memory. So gmres!
takes maybe 25% longer than it would have if we managed to avoid allocating memory.
This could change significantly depending on the values and sizes of A
and b
(perhaps more allocations are performed if more iterations are required, I don’t know the internals of the function). Using gmres!
to solve a 2x2 system Ax=b
, where A
and b
are known explicitly, is not a typical use case. (backslash solves the same problem in < 0.3 microseconds, and I think almost all of that time is spent allocating memory).
I tried a random 1000x1000 example, and gmres took ~100 ms, while I estimate the memory allocation took < 0.1 ms.
Still, it would be nice to have a ‘cached’ option, just to be sure that memory allocation isn’t significantly affecting performance.