Using multiple in-place krylov solver from Krylov.jl

I’m trying to solve several linear equations that each change with each step of time.

For example, I have a 100x300 grid that I need to solve for every time step.
I’m trying to use the in-place solvers from Krylov.jl. I make a matrix of multiple BicgstabSolver() in order to allocate memory so that it can be used in multithreading but they all seem to change to the last value?

Here’s the example:

using Krylov

AA = [1.0+3.0im 2.0im; 2.0+0.0im 1.0+4.0im]
rhs = rand(ComplexF64, 2);

bicstab_solver = BicgstabSolver(length(rhs),length(rhs), typeof(rhs))

BICSTAB_Solver = fill(deepcopy(bicstab_solver), (100,300));

function testkrylovsolver(bsolv, AA)

    for i in 1:size(bsolv)[2]
        for j in 1:size(bsolv)[1]
            rhs = rand(ComplexF64, 2);
            Krylov.bicgstab!(bsolv[j,i],AA,rhs);

        end
    end
    return nothing

end

testkrylovsolver(BICSTAB_Solver, AA)

To test:

BICSTAB_Solver[1,1].x
Output:

2-element Vector{ComplexF64}:
 0.0638220002534428 + 0.022209886941768144im
 0.1879664301184788 - 0.036897073475912104im

and

BICSTAB_Solver[100,6].x

Output:

2-element Vector{ComplexF64}:
 0.0638220002534428 + 0.022209886941768144im
 0.1879664301184788 - 0.036897073475912104im

It looks like doing something like:

BICSTAB_Solver = [BicgstabSolver(length(rhs),length(rhs), typeof(rhs)) for i in 1:100, j in 1:300];

Seems to work but for larger system this becomes extremely slow in comparison with:

BICSTAB_Solver = fill(deepcopy(BicgstabSolver(length(rhs),length(rhs), typeof(rhs))), (100,300))

Anyone know the reason why these two would be different?

Aside: for sparse matrices arising from 2d grids, especially small 2d grids, you are often better off with a sparse-direct solver (e.g. Julia’s default \ based on SuiteSparse, or alternative ones like MUMPS.jl etcetera) than an iterative solver.

1 Like

Ok I figured it out. I’m going to post with a multithread example:

using Krylov

Nx = 100
Nt = 500

AA = [1.0+3.0im 2.0im; 2.0+0.0im 1.0+4.0im]
rhs = rand(ComplexF64, 2);
#make BicstabSolver for each column in grid for in-place method
BICSTAB_Solver = [ BicgstabSolver(length(rhs),length(rhs), typeof(rhs)) for i in 1:Nx ];

#make rhs of random
RHS = [rand(ComplexF64, 2) for i in 1:500, j in 1:Nx];
#store solutions
sol = [rand(ComplexF64, 2) for i in 1:500, j in 1:Nx];

function testkrylovsolver(sol, bsolv, AA, RHS)

    @threads for i in 1:size(RHS)[2]
        for j in 1:size(RHS)[1]
            #rhs = rand(ComplexF64, 2);
            #RHS[j,i] .= rhs
            Krylov.bicgstab!(bsolv[i],AA,RHS[j,i]);
            sol[j,i] .= bsolv[i].x

        end
    end
    return nothing

end

Then we use it:

testkrylovsolver(sol,BICSTAB_Solver, AA,RHS)

Which gives results:

AA*sol[1,1] - RHS[1,1]
Output:

2-element Vector{ComplexF64}:
 -5.551115123125783e-17 + 0.0im
 -2.220446049250313e-16 + 1.1102230246251565e-16im

Which is close enough to zero

And:

AA*sol[100,100] - RHS[100,100]

Output:

2-element Vector{ComplexF64}:
 5.551115123125783e-17 - 5.551115123125783e-17im
                   0.0 - 2.7755575615628914e-17im

Which again is close enough to zero.
Because multithreading is splitting up the grid by the columns, one can reuse the same KrylovSolver as it traverses the the rows of the column.

1 Like

Hi @erny123!
I’m glad that you use Krylov.jl for solving sparse linear systems with complex numbers!

I suggest to also try block_gmres for your application because you want to solve AX= B where B is a matrix and not a vector.

If you want a solution with high-precision, you can set the keyword arguments rtol=0.0 and atol=1e-12.

If you have any question, don’t hesitate. :smiley:

These two are indeed different. fill takes as its first argument just a value, and fills an array with that value. So your statement

fill(deepcopy(BicgstabSolver(length(rhs),length(rhs), typeof(rhs))), (100,300))

just takes a single deepcopy from the BicgstabSolver, and then this one copy is used to fill the array, i.e. all elements of the resulting array refer to the same single copy.

Instead, the statement

[BicgstabSolver(length(rhs),length(rhs), typeof(rhs)) for i in 1:100, j in 1:300]

considers the part before for to be a function of your iteration variables, in this case i and j, and reevaluates this statement for every i and j. So in this case, you get 300*100 different BicgstabSolver instances. That is of course slower and uses more memory, but seems to be the behaviour you were hoping to get.