Does Julia have any package for sparse linear solvers for LU decomposed sparse matrices that are multi-threaded?

I want to improve the solution speed of a big system which I have to solve ~500-700 times.

using LinearAlgebra
using SparseArrays
function create_mat(n::Int)
diagonal = spdiagm(0 => [1.0 for i in 1:n])
off_diagonal = spdiagm(-1 => [-0.01 for i in 2:n])
mat = diagonal + off_diagonal + off_diagonal'
return mat
end
n = 1000000
A = create_mat(n)
lu_A = lu(A , check=true);
x = randn(n);
for i = 1:500
x = lu_A \ x;
end

I am interested in keeping the LU factorization.
This example appears to be solving with a single thread.
Are there any multi-threaded sparse linear solvers I can use to solve this system faster?

You likely want to use BandedMatrices.jl if you have sparse matrices with a banded structure since they will give you O(n) factorization algorithms. In this case, it will still run single threaded, but it will be a lot more efficient.

There is LinearSolve.jl, which defines/wraps a lot of linear solvers in a common interface. It belongs to the SciML ecosystem, which is typically very performant.

using LinearAlgebra, SparseArrays, LinearSolve, Sparspak
import Pardiso
using BenchmarkTools
n = 1000000
A = create_mat(n)
lu_A = lu(A , check=true);
x = rand(n);
y=zeros(n);
@btime solve(LinearProblem(A, x;), UMFPACKFactorization());
@btime solve(LinearProblem(A, x;), KLUFactorization());
@btime solve(LinearProblem(A, x;), MKLPardisoFactorize());
@btime solve(LinearProblem(A, x;), SparspakFactorization());
@btime lu_A\x;
@btime ldiv!(y, lu_A, x);

814.338 ms (95 allocations: 1.04 GiB)
259.144 ms (56 allocations: 549.32 MiB)
1.280 s (47 allocations: 118.26 MiB)
1.275 s (182 allocations: 595.22 MiB)
17.547 ms (4 allocations: 15.26 MiB)
15.321 ms (0 allocations: 0 bytes)

The times are not fair has LinearSolve have to decompose + solve.
Can LinearSolve know that it only needs to decompose the matrix once and solve for many âbâ's?

UMFPACKFactorization(), 15.598 ms (0 allocations: 0 bytes)
KLUFactorization(), 13.361 ms (1 allocation: 16 bytes)
MKLPardisoFactorize(), (4 allocations: 15.26 MiB)
SparspakFactorization(), (10 allocations: 22.89 MiB)
lu_A\x; 17.547 ms (4 allocations: 15.26 MiB)
ldiv!(lu_A, x); 15.321 ms (0 allocations: 0 bytes)

I guess there is a slight improvement with KLUFactorization().
But only MKLPardisoFactorize() used my machine resources. The rest used ~30% capacity. So I guess they are not working in parallel.

It is one that I think is inherently pretty much impossible to parallelize. The function runs quickly, and the dependency chain is pretty long. If you want parallelism, your best bet is to have separate cores running different linear solves.

function benchsolve(A, x)
algs = [
# UMFPACKFactorization(),
# KLUFactorization(),
MKLPardisoFactorize(),
# SparspakFactorization()
]
for alg in algs
prob = LinearProblem(A, x);
linsolve=init(prob, alg);
for i in 1:500
solve!(linsolve, alg);
linsolve.b .= linsolve.u;
end
end
end
@btime benchsolve(A, x);