IterativeSolvers.jl not working as expected with mutating and allocation

I’m trying to solve a simple linear system. However, it seems like the preallocated versions of any IterativeSolvers.jl functions are either much slower or allocating more memory.

Here’s the example:

``````using IterativeSolvers
using LinearMaps

function SecondOrderCentralDiffmul!(C, B)
C[1] = -2B[1] + B[2]
for i in 2:length(B)-1
C[i] = B[i-1] - 2B[i] + B[i+1]
end
C[end] = B[end-1] - 2B[end]
return C
end

nnn = 51
A = LinearMap(SecondOrderCentralDiffmul!, nnn; issymmetric=true, ismutating=true)
b = rand(nnn)

U = gmres(A, b,maxiter=10000000)
norm(A*U - b)
``````

With the following tests:

`@btime gmres(A, b,maxiter=10000000);`

`697.949 μs (43 allocations: 19.88 KiB)`

And mutating:

``````xxx = zeros(Float64, nnn)
@btime gmres!(xxx, A, b,maxiter=10000000);
``````

`13.037 s (500015 allocations: 106.82 MiB)`

However, when I run without `@btime` :

`@time gmres!(xxx, A, b,maxiter=10000000);`

`0.000785 seconds (42 allocations: 19.391 KiB)`

So first, what’s going on with `@btime` for these tests? Second, why is the allocations the same for both `gmres` and `gmres!` ?

This is actually not related to IterativeSolvers, but to BenchmarkTools subtleties, mostly because it runs the function several times to get a better estimate of the time.

First, as a rule, you should always interpolate global variables when benchmarking.

Furthermore, running the mutating version several times is biased. Indeed, for `gmres!`, the first argument is not just a scratch space: it is also the initial guess. Thus, once you have run the solver, the initial guess actually contains the optimal solution, and this seems to cause weird behavior (although I’m not sure why we see a slowdown and not a speedup).
To escape this issue, you need to

• create a `setup` phase in your benchmark to initialize it with a new zero vector every time
• set `evals = 1` in the benchmark parameters to make sure that each benchmark sample runs `gmres!` only once, thus avoiding the bias

When you do all that, you realize that `@btime` does indeed give you a more accurate (and lower) result than `@time`:

``````julia> @btime gmres(\$A, \$b, maxiter=10000000);
603.156 μs (43 allocations: 19.86 KiB)

julia> @btime gmres!(\$xxx, \$A, \$b, maxiter=10000000);
12.053 s (500015 allocations: 106.82 MiB)

julia> @btime gmres!(_xxx, \$A, \$b, maxiter=10000000) evals=1 setup=(_xxx = zeros(nnn));
603.605 μs (43 allocations: 19.41 KiB)

julia> @time gmres(A, b, maxiter=10000000);
0.000894 seconds (43 allocations: 19.859 KiB)
``````
6 Likes

@gdalle awesome!

One last thing is that I still don’t understand why the mutating `gmres!` is allocating the same amount of memory as non-mutating version?

I would also note that mutating `gmres!` allocates memory even without using `LinearMaps`.

``````julia> using IterativeSolvers, BenchmarkTools

julia> function f!(x, A, b, n_runs=1)
for i in 1:n_runs
x .= 0
gmres!(x, A, b)
end
return nothing
end
f! (generic function with 2 methods)

julia> x = zeros(2)
2-element Vector{Float64}:
0.0
0.0

julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
1.0  2.0
3.0  4.0

julia> b = [5.0, 6.0]
2-element Vector{Float64}:
5.0
6.0

julia> f!(x, A, b, 1) # Run once to compile

julia> @btime f!(x, A, b, 1)
1.393 μs (15 allocations: 1.23 KiB)

julia> @btime f!(x, A, b, 10)
13.151 μs (150 allocations: 12.34 KiB)

julia> @btime f!(x, A, b, 100)
131.335 μs (1500 allocations: 123.44 KiB)

julia> @btime gmres(A, b)
1.374 μs (16 allocations: 1.31 KiB)
``````

As you can see, each call to `gmres!` makes 15 allocations even for the simplest possible case of ordinary matrices and vectors. `gmres` makes 16 allocations, one more to allocate the return vector. But most of the allocations appear to be used internally by the algorithm.

It would be nice if we had an option to provide a ‘cache’ variable, so that `gmres!` didn’t have to allocate memory each time it ran.

The good news is that (at least for this small example), the time taken to allocate memory doesn’t tank performance.

``````julia> function allocate_memory(n_runs; alloc_size=1)
for i in 1:n_runs
a = Vector{Int64}(undef, alloc_size)
a[1] += 1 # If we don't do some operation on a, the compiler won't allocate a in the first place
end
end
allocate_memory (generic function with 2 methods)

julia> @btime allocate_memory(15, alloc_size=1)
258.078 ns (15 allocations: 960 bytes)

julia> @btime allocate_memory(15, alloc_size=1000)
1.601 μs (15 allocations: 119.06 KiB)

julia> @btime f!(x, A, b, 1) # Run once to compile
1.396 μs (15 allocations: 1.23 KiB)

julia> @btime allocate_memory(15, alloc_size=2)
263.934 ns (15 allocations: 1.17 KiB)
``````

It looks like `alloc_size=2` allocates the amount of memory closest to that of `gmres!` (probably allocates 15 vectors the size of the system, or something like that). So I would estimate that in this example 0.263 microseconds of the 1.393 microsecond runtime is spent allocating memory. So `gmres!` takes maybe 25% longer than it would have if we managed to avoid allocating memory.

This could change significantly depending on the values and sizes of `A` and `b` (perhaps more allocations are performed if more iterations are required, I don’t know the internals of the function). Using `gmres!` to solve a 2x2 system `Ax=b`, where `A` and `b` are known explicitly, is not a typical use case. (backslash solves the same problem in < 0.3 microseconds, and I think almost all of that time is spent allocating memory).

I tried a random 1000x1000 example, and gmres took ~100 ms, while I estimate the memory allocation took < 0.1 ms.

Still, it would be nice to have a ‘cached’ option, just to be sure that memory allocation isn’t significantly affecting performance.

1 Like

I definitely agree. The fact that we can’t provide this option makes it so that for many specific cases the user will have to write their own Iterative solver in order to manage the memory and speed better.

Actually I think there is an advanced interface where you can avoid repeated allocation. See here: The iterator approach · IterativeSolvers.jl

1 Like

@abraemer wow completely missed this section. Good catch

You’re right, thanks!

I would note that it’s not immediately obvious how to avoid repeated allocation with gmres, since the user has to repeat some of the initialization process when setting a new value for `b`. But some of the solvers, like jacobi, are much easier (just set a new `b` and do the iteration).

Here is a function which I think is sufficient to ‘reinitialize’ the gmres iterable with a new initial guess `x` and a new right-hand-side `b`:

``````function update_gmres_iterable!(iterable, x, b)
iterable.b .= b
iterable.x .= x
iterable.mv_products = 0
iterable.arnoldi.H .= 0
iterable.arnoldi.V .= 0
iterable.residual.accumulator = 1
iterable.residual.current = 1
iterable.residual.nullvec .= 1
iterable.residual.β = 1
iterable.residual.current = IterativeSolvers.init!(
iterable.arnoldi, iterable.x, iterable.b, iterable.Pl, iterable.Ax,
initially_zero=false
)
iterable.residual.nullvec .= 1
IterativeSolvers.init_residual!(iterable.residual, iterable.residual.current)
iterable.β = iterable.residual.current
return nothing
end
``````

And here is an example of it in action:

``````julia> A = [1.0 2;3 4];

julia> b = [2.0, 4];

julia> x = [-1.0, 1];

julia> gmres_iter = IterativeSolvers.gmres_iterable!(x, A, b, abstol=1e-10, reltol=1e-10);

julia> for (i, iter) in enumerate(gmres_iter)
println("iteration \$i done")
end
iteration 1 done
iteration 2 done

julia> A * x
2-element Vector{Float64}:
2.0
4.0

julia> x2 = [-2.0, 2.0]; b2 = [4.0, 8.0];

julia> update_gmres_iterable!(gmres_iter, x2, b2)

julia> for (i, iter) in enumerate(gmres_iter)
println("iteration \$i done")
end
iteration 1 done
iteration 2 done

julia> A*x
2-element Vector{Float64}:
4.0
8.0

``````
1 Like

That seems like useful information that could be preserved. I suggest opening a pull request to IterativeSolvers.jl to put this into the documentation as an example usage of the iterator interface

2 Likes

Thanks, I’m working on it now.