IterativeSolvers.jl not working as expected with mutating and allocation

erny123 · February 16, 2024, 2:40am

I’m trying to solve a simple linear system. However, it seems like the preallocated versions of any IterativeSolvers.jl functions are either much slower or allocating more memory.

Here’s the example:

using IterativeSolvers
using LinearMaps

function SecondOrderCentralDiffmul!(C, B)
    C[1] = -2B[1] + B[2]
    for i in 2:length(B)-1
        C[i] = B[i-1] - 2B[i] + B[i+1]
    end
    C[end] = B[end-1] - 2B[end]
    return C
end

nnn = 51
A = LinearMap(SecondOrderCentralDiffmul!, nnn; issymmetric=true, ismutating=true)
b = rand(nnn)

U = gmres(A, b,maxiter=10000000)
norm(A*U - b)

With the following tests:

@btime gmres(A, b,maxiter=10000000);

697.949 μs (43 allocations: 19.88 KiB)

And mutating:

xxx = zeros(Float64, nnn)
@btime gmres!(xxx, A, b,maxiter=10000000);

13.037 s (500015 allocations: 106.82 MiB)

However, when I run without @btime :

@time gmres!(xxx, A, b,maxiter=10000000);

0.000785 seconds (42 allocations: 19.391 KiB)

So first, what’s going on with @btime for these tests? Second, why is the allocations the same for both gmres and gmres! ?

gdalle · February 16, 2024, 10:38am

This is actually not related to IterativeSolvers, but to BenchmarkTools subtleties, mostly because it runs the function several times to get a better estimate of the time.

First, as a rule, you should always interpolate global variables when benchmarking.

Furthermore, running the mutating version several times is biased. Indeed, for gmres!, the first argument is not just a scratch space: it is also the initial guess. Thus, once you have run the solver, the initial guess actually contains the optimal solution, and this seems to cause weird behavior (although I’m not sure why we see a slowdown and not a speedup).
To escape this issue, you need to

create a setup phase in your benchmark to initialize it with a new zero vector every time
set evals = 1 in the benchmark parameters to make sure that each benchmark sample runs gmres! only once, thus avoiding the bias

When you do all that, you realize that @btime does indeed give you a more accurate (and lower) result than @time:

julia> @btime gmres($A, $b, maxiter=10000000);
  603.156 μs (43 allocations: 19.86 KiB)

julia> @btime gmres!($xxx, $A, $b, maxiter=10000000);
  12.053 s (500015 allocations: 106.82 MiB)

julia> @btime gmres!(_xxx, $A, $b, maxiter=10000000) evals=1 setup=(_xxx = zeros(nnn));
  603.605 μs (43 allocations: 19.41 KiB)

julia> @time gmres(A, b, maxiter=10000000);
  0.000894 seconds (43 allocations: 19.859 KiB)

erny123 · February 16, 2024, 5:14pm

@gdalle awesome!

One last thing is that I still don’t understand why the mutating gmres! is allocating the same amount of memory as non-mutating version?

leespen1 · February 16, 2024, 6:39pm

I would also note that mutating gmres! allocates memory even without using LinearMaps.

julia> using IterativeSolvers, BenchmarkTools

julia> function f!(x, A, b, n_runs=1)
         for i in 1:n_runs
           x .= 0
           gmres!(x, A, b)
         end
         return nothing
       end
f! (generic function with 2 methods)

julia> x = zeros(2)
2-element Vector{Float64}:
 0.0
 0.0

julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0

julia> b = [5.0, 6.0]
2-element Vector{Float64}:
 5.0
 6.0

julia> f!(x, A, b, 1) # Run once to compile

julia> @btime f!(x, A, b, 1)
  1.393 μs (15 allocations: 1.23 KiB)

julia> @btime f!(x, A, b, 10)
  13.151 μs (150 allocations: 12.34 KiB)

julia> @btime f!(x, A, b, 100)
  131.335 μs (1500 allocations: 123.44 KiB)

julia> @btime gmres(A, b)
  1.374 μs (16 allocations: 1.31 KiB)

As you can see, each call to gmres! makes 15 allocations even for the simplest possible case of ordinary matrices and vectors. gmres makes 16 allocations, one more to allocate the return vector. But most of the allocations appear to be used internally by the algorithm.

It would be nice if we had an option to provide a ‘cache’ variable, so that gmres! didn’t have to allocate memory each time it ran.

The good news is that (at least for this small example), the time taken to allocate memory doesn’t tank performance.

julia> function allocate_memory(n_runs; alloc_size=1)
         for i in 1:n_runs
           a = Vector{Int64}(undef, alloc_size)
           a[1] += 1 # If we don't do some operation on a, the compiler won't allocate a in the first place
         end
       end
allocate_memory (generic function with 2 methods)

julia> @btime allocate_memory(15, alloc_size=1)
  258.078 ns (15 allocations: 960 bytes)

julia> @btime allocate_memory(15, alloc_size=1000)
  1.601 μs (15 allocations: 119.06 KiB)

julia> @btime f!(x, A, b, 1) # Run once to compile
  1.396 μs (15 allocations: 1.23 KiB)

julia> @btime allocate_memory(15, alloc_size=2)
  263.934 ns (15 allocations: 1.17 KiB)

It looks like alloc_size=2 allocates the amount of memory closest to that of gmres! (probably allocates 15 vectors the size of the system, or something like that). So I would estimate that in this example 0.263 microseconds of the 1.393 microsecond runtime is spent allocating memory. So gmres! takes maybe 25% longer than it would have if we managed to avoid allocating memory.

This could change significantly depending on the values and sizes of A and b (perhaps more allocations are performed if more iterations are required, I don’t know the internals of the function). Using gmres! to solve a 2x2 system Ax=b, where A and b are known explicitly, is not a typical use case. (backslash solves the same problem in < 0.3 microseconds, and I think almost all of that time is spent allocating memory).

I tried a random 1000x1000 example, and gmres took ~100 ms, while I estimate the memory allocation took < 0.1 ms.

Still, it would be nice to have a ‘cached’ option, just to be sure that memory allocation isn’t significantly affecting performance.

erny123 · February 16, 2024, 7:13pm

I definitely agree. The fact that we can’t provide this option makes it so that for many specific cases the user will have to write their own Iterative solver in order to manage the memory and speed better.

Thanks again for the answers!

abraemer · February 16, 2024, 9:36pm

Actually I think there is an advanced interface where you can avoid repeated allocation. See here: The iterator approach · IterativeSolvers.jl

erny123 · February 18, 2024, 5:21pm

@abraemer wow completely missed this section. Good catch

leespen1 · March 5, 2024, 9:18pm

You’re right, thanks!

I would note that it’s not immediately obvious how to avoid repeated allocation with gmres, since the user has to repeat some of the initialization process when setting a new value for b. But some of the solvers, like jacobi, are much easier (just set a new b and do the iteration).

Here is a function which I think is sufficient to ‘reinitialize’ the gmres iterable with a new initial guess x and a new right-hand-side b:

function update_gmres_iterable!(iterable, x, b)
    iterable.b .= b
    iterable.x .= x
    iterable.mv_products = 0
    iterable.arnoldi.H .= 0
    iterable.arnoldi.V .= 0
    iterable.residual.accumulator = 1
    iterable.residual.current = 1
    iterable.residual.nullvec .= 1
    iterable.residual.β = 1
    iterable.residual.current = IterativeSolvers.init!(
        iterable.arnoldi, iterable.x, iterable.b, iterable.Pl, iterable.Ax,
        initially_zero=false
    )
    iterable.residual.nullvec .= 1
    IterativeSolvers.init_residual!(iterable.residual, iterable.residual.current)
    iterable.β = iterable.residual.current
    return nothing
end

And here is an example of it in action:

julia> A = [1.0 2;3 4];

julia> b = [2.0, 4];

julia> x = [-1.0, 1];

julia> gmres_iter = IterativeSolvers.gmres_iterable!(x, A, b, abstol=1e-10, reltol=1e-10);

julia> for (i, iter) in enumerate(gmres_iter)
         println("iteration $i done")
       end
iteration 1 done
iteration 2 done

julia> A * x
2-element Vector{Float64}:
 2.0
 4.0

julia> x2 = [-2.0, 2.0]; b2 = [4.0, 8.0];

julia> update_gmres_iterable!(gmres_iter, x2, b2)

julia> for (i, iter) in enumerate(gmres_iter)
         println("iteration $i done")
       end
iteration 1 done
iteration 2 done

julia> A*x
2-element Vector{Float64}:
 4.0
 8.0

abraemer · March 5, 2024, 9:40pm

That seems like useful information that could be preserved. I suggest opening a pull request to IterativeSolvers.jl to put this into the documentation as an example usage of the iterator interface

leespen1 · March 5, 2024, 9:49pm

Thanks, I’m working on it now.

Topic		Replies	Views
Type Inference Issues in IterativeSolvers with LinearMaps Performance package , packages , profiling , type-stability , cthulhu	2	211	November 9, 2023
Reduce allocations LinearSolve Performance	4	289	September 22, 2023
Issue with using IterativeSolvers, LinearMaps and CuArrays General Usage	6	1135	July 9, 2018
How to distribute the solving of a linear system on multiple Procs? General Usage question , linearalgebra , distributions	0	258	June 7, 2022
How Preconditioners work for multiple eigenpairs Numerics eigenvalues , iterative-solvers	4	326	September 3, 2023

IterativeSolvers.jl not working as expected with mutating and allocation

Related topics