Increasing the solution speed of sparse linear system

aqw · May 12, 2024, 8:11pm

Hello,

Does Julia have any package for sparse linear solvers for LU decomposed sparse matrices that are multi-threaded?

I want to improve the solution speed of a big system which I have to solve ~500-700 times.

    using LinearAlgebra
    using SparseArrays
    function create_mat(n::Int)
        diagonal = spdiagm(0 => [1.0 for i in 1:n]) 
        off_diagonal = spdiagm(-1 => [-0.01 for i in 2:n])
        mat = diagonal + off_diagonal + off_diagonal' 
        return mat
    end
    
    
    n = 1000000 
    A = create_mat(n)
    lu_A = lu(A , check=true);

    x = randn(n);

    for i = 1:500
        x = lu_A \ x;
    end

I am interested in keeping the LU factorization.
This example appears to be solving with a single thread.
Are there any multi-threaded sparse linear solvers I can use to solve this system faster?

Best Regards

Oscar_Smith · May 12, 2024, 9:01pm

You likely want to use BandedMatrices.jl if you have sparse matrices with a banded structure since they will give you O(n) factorization algorithms. In this case, it will still run single threaded, but it will be a lot more efficient.

function create_band_mat(n::Int)
    return Symmetric(BandedMatrix(-1=>fill(-.01, n-1), 0=>fill(1.0,n), 1=>fill(-.01,n-1))) 
end
julia> @time A_band_chol = cholesky(A_band);
  0.098926 seconds (3 allocations: 22.888 MiB)
julia> @time A_band_chol\x;
  0.019270 seconds (2 allocations: 7.629 MiB)

aqw · May 13, 2024, 10:59am

Hi, thanks for the reply.

I would like to keep the LU factorization (my actual matrix is not exactly banded) and just increase the speed of lu_A \ x.

Given that A will remain constant. Is it possible to perform the ‘\’ operation concurrently and not in serial?

fgerick · May 13, 2024, 11:47am

You can do all solves at once:

x=randn(n,500)

lu_A\x

or if you don’t have all x before you can at least use ldiv!(y,lu_A,x) to avoid allocations in each solve.

And the solve is multi-threaded through BLAS, but you can also try Pardiso.jl as an alternative.

jamblejoe · May 13, 2024, 12:21pm

There is LinearSolve.jl, which defines/wraps a lot of linear solvers in a common interface. It belongs to the SciML ecosystem, which is typically very performant.

aqw · May 13, 2024, 6:59pm

Here is my attempt at using LinearSolve.

using LinearAlgebra, SparseArrays, LinearSolve, Sparspak
import Pardiso
using BenchmarkTools

n = 1000000
A = create_mat(n)
lu_A = lu(A , check=true);
x = rand(n);
y=zeros(n);

@btime solve(LinearProblem(A, x;), UMFPACKFactorization());
@btime solve(LinearProblem(A, x;), KLUFactorization());
@btime solve(LinearProblem(A, x;), MKLPardisoFactorize());
@btime solve(LinearProblem(A, x;), SparspakFactorization());
@btime lu_A\x;
@btime ldiv!(y, lu_A, x);

  814.338 ms (95 allocations: 1.04 GiB)
  259.144 ms (56 allocations: 549.32 MiB)
  1.280 s (47 allocations: 118.26 MiB)
  1.275 s (182 allocations: 595.22 MiB)
  17.547 ms (4 allocations: 15.26 MiB)
  15.321 ms (0 allocations: 0 bytes)

The times are not fair has LinearSolve have to decompose + solve.
Can LinearSolve know that it only needs to decompose the matrix once and solve for many “b”'s?

rayegun · May 13, 2024, 7:10pm

Yes

aqw · May 13, 2024, 7:40pm

Shouldn’t this work?

function solve(A, x)
    algs = [
                UMFPACKFactorization(),
                KLUFactorization(),
                MKLPardisoFactorize(),
                SparspakFactorization()
            ]

    for i in 1:4
        prob = LinearProblem(A, x);
        linsolve=init(prob);  
        sol1=solve!(linsolve, algs[i]);
        linsolve.b = x;
        @btime sol2=solve!(linsolve, algs[i]);
    end 
end

solve(A, x)

ChrisRackauckas · May 13, 2024, 7:45pm

Interesting. That’s worth digging into. @Oscar_Smith does a quick flamegraph reveal anything?

Oscar_Smith · May 13, 2024, 7:54pm

This is just missing some necessary macro interpolation. This works.

function benchsolve(A, x)
    algs = [
                UMFPACKFactorization(),
                KLUFactorization(),
                MKLPardisoFactorize(),
                SparspakFactorization()
            ]
 
    for alg in algs
        prob = LinearProblem(A, x);
        linsolve=init(prob, alg);
        linsolve.b = x;
        @btime sol2=solve!($linsolve, $alg) setup=$linsolve.b.=$x;
    end 
end

aqw · May 13, 2024, 8:03pm

Updated times:

UMFPACKFactorization(), 15.598 ms (0 allocations: 0 bytes)
KLUFactorization(),  13.361 ms (1 allocation: 16 bytes)
MKLPardisoFactorize(), (4 allocations: 15.26 MiB)
SparspakFactorization(), (10 allocations: 22.89 MiB)

lu_A\x;  17.547 ms (4 allocations: 15.26 MiB)
ldiv!(lu_A, x);  15.321 ms (0 allocations: 0 bytes)

I guess there is a slight improvement with KLUFactorization().
But only MKLPardisoFactorize() used my machine resources. The rest used ~30% capacity. So I guess they are not working in parallel.

Oscar_Smith · May 13, 2024, 8:06pm

It is one that I think is inherently pretty much impossible to parallelize. The function runs quickly, and the dependency chain is pretty long. If you want parallelism, your best bet is to have separate cores running different linear solves.

ChrisRackauckas · May 13, 2024, 10:59pm

Well it’s possible the new ParU method in SuiteSparse could do better. @rayegun how’s that coming along?

aqw · May 15, 2024, 9:38am

Is it normal for the allocations to be high ?

function benchsolve(A, x)
    algs = [
                # UMFPACKFactorization(),
                # KLUFactorization(),
                MKLPardisoFactorize(),
                # SparspakFactorization()
            ]
    for alg in algs
        prob = LinearProblem(A, x);
        linsolve=init(prob, alg);

        for i in 1:500
            solve!(linsolve, alg);
            linsolve.b .= linsolve.u;
        end
    end 
end

@btime benchsolve(A, x);

Gives: 13.833 s (3042 allocations: 7.55 GiB)

ChrisRackauckas · May 19, 2024, 7:42pm

No, not normal. Something does seem off there. What’s the reproducer?

Topic		Replies	Views
Solving Sparse Linear Systems fast Performance sparse , linearsolve	11	5015	June 23, 2022
Is it possible to parallelize matrix division Performance	10	666	October 1, 2023
Is the linear system solver \ also multi threaded in Julia as in Matlab? And how to “multithread” it in Julia? Performance linearalgebra , sparse	33	2434	August 29, 2024
How to solve this Ax=b faster? Numerics linearalgebra , linearsolve	24	2554	June 7, 2022
Linear solve slower with Banded Matrix General Usage linearalgebra	8	141	September 25, 2024

Increasing the solution speed of sparse linear system

Related topics