How to solve this Ax=b faster?

I am trying to solve a sparse stiffness matrix (positive definite) of size 1986360x1986360 SparseMatrixCSC{Float64, Int64} with 18327153 stored entries.
Here’s a sample code.

using SparseArrays
using LinearAlgebra

A = sprandn(1986360, 1986360, 2e-6).*20;
A = A+A'+100I

b = zeros(size(a,1));
b[[1:331060;].*6] .= 200;

x = A\b

The problem is that the last step takes so much time as it uses only one core (have never actually finished it). I tried running it via julia REPL normally as well as by launching julia with julia --threads=16.

I also tried solving it as x=cholesky(A)\b but that also uses only one core. It also throws SuiteSparse.CHOLMOD.CHOLMODException("out of memory") after sometime even though 23 gb of memory(out of 32) is available.

Is there anyway to make julia use all the cores? Also, I am trying solve such a large matrix for the first time. So, please do suggest if there is something I can do to solve this faster.

1 Like

Have you tried iterative solvers?

There is Krylov.jl and many others.


Random sparse systems should be solved by iterative solvers, since the default direct solver causes fill-in during Gaussian elimination and blows up the memory usage.


Side note: Don’t do this. It allocates an array - no! two arrays - needlessly. It is of course not an effect of any significance here, but it’s a good idea to mind unnecessary allocations in general.

Instead, do (1:331060) .* 6 or 6:6:(331060*6).


Thank you! I’ll check this out.

Thanks. Noted.

Could you explain this a little more? I thought that sparse systems have separate solvers that act on them. Sparse Linear Algebra · The Julia Language
But I do face out of memory error while trying cholesky(A).

Sparse matrices encountered in many real-world problems have some structures, which makes it possible to do Gaussian elimination without turning the matrix into a dense matrix in intermediate steps (called “fill-in”). If your sparse matrix is completely random, then Gaussian elimination doesn’t work well, and you should switch to a package that implements iterative methods which are insensitive to the fill-in problem.

1 Like

you might want to look at this reply by @stevengj, I haven’t tried it yet myself, but he mentions two parallel solvers MUMPS.jl and Pardiso.jl that might be of help

Just use LinearSolve.jl

And then run the gambit.

for alg in [KrylovJL_GMRES(), KLUFactorization(), KrylovKitJL_GMRES(), IterativeSolversJL_GMRES(), UMFPACKFactorization(), MKLPardisoFactorize(), MKLPardisoIterate()]
  sol = solve(prob,alg)

Likely MKLPardiso is best without a preconditioner. For the GMRES ones, you may want to define a preconditioner like:

1 Like

It seems like Pardiso is not free for academic users anymore. Lets me sign up but received a mail saying academic license discontinued from Jan 2022.

Thank you. I was able to solve it with Krylov. But I’ll check this out as well!

1 Like

Learned an important point from you and @Jae-Mo_Lihm

If this problem is occuring as part of an optimization problem, I can recommend COSMO.jl

I tried LinearSolve on a 3D Finite Element Problem with 68’170 elements and 82’424 nodes of the type

\mathbf{A}\mathbf{u} = \mathbf{r}

where \mathbf{A} is a sparse SPD matrix with 162’372 rows with 10’889’190 non zero elements (on average 67 non zero entries per row). I tried the following

using using LinearSolve
prob = LinearProblem(A,r);

for alg in [KrylovJL_GMRES(), KLUFactorization(), KrylovKitJL_GMRES(), IterativeSolversJL_GMRES(), UMFPACKFactorization()]
  @time sol = LinearSolve.solve(prob,alg)

180.594400 seconds (2.56 M allocations: 2.356 GiB, 3.02% gc time, 0.45% compilation time)
217.641759 seconds (4.28 M allocations: 3.479 GiB, 0.43% gc time)
181.348235 seconds (244.91 k allocations: 2.233 GiB, 3.01% gc time, 0.05% compilation time)
954.111518 seconds (4.03 M allocations: 450.389 MiB, 0.05% gc time, 0.17% compilation time)
 13.303307 seconds (121.40 k allocations: 2.719 GiB, 5.24% gc time, 0.96% compilation time)

the MKL solvers gave and UndefVarError so I couldnt try them.
I also tried Krylov.jl, and got

using Krylov
@time x, stats =,r)

 80.857184 seconds (1.28 M allocations: 77.709 MiB, 0.88% compilation time)

if now I compare these performances with the standard LinearAlgebra.jl package, this is what I get

@time A\r;

  3.062161 seconds (661 allocations: 1.390 GiB, 1.07% gc time, 0.13% compilation time)

it seems that the LinearAlgebra solver, on this problem, outperforms all the others, should that be the case or am I not getting the best speed from the other solvers for some reason? or is it the size of the problem that influences the performances.

(please, do repost this as a new thread if think it is appropriate)

1 Like

And it’s not just compile time? The LinearAlgebra default is just UMFPACKFactorization so it should be the same time. Do you have a good way for me to reproduce this?

For the Krylov methods, did you try adding any preconditioners? Algebraic multigrid is probably a good idea here.

The matrix \mathbf{A} is the result of a FE mesh with hexahedral elements of a solid mechanics problem, with 3 degrees of freedom (DoFs) per node. I have Dirichlet boundary conditions on the boundary and free internal nodes, thus the problem is constructed as follows

\mathbf{A} = \mathbf{Kt}[\textrm{iDoFs},\textrm{iDoFs}]\\ \mathbf{r} = \mathbf{Kt}[\textrm{iDoFs},\textrm{bDoFs}]\mathbf{s}[\textrm{bDoFs}]

where \mathbf{Kt} is the stiffness matrix of the entire domain, \textrm{bDoFs} are the indices of the boundary DoFs and \textrm{iDoFs} are the indices of the internal DoFs, \mathbf{s} is the entire DoFs vector. \mathbf{A} is Hermitian, with size(A)=(162372, 162372)

Depending on the position of the element, whether internal or on the boundary, each node can be surrounded from 1 up to 8 elements, which decides the number of of non zero entries per each row of the matrix, all diagonal entries are non zero. It is a quite structured matrix, which is difficult to obtain by random generation, if it helps can share the matrix and the vectors as JLD2 files.

This is what I tried

u = @btime  A\r
@show  norm(r-A*u);

  3.843 s (52 allocations: 1.39 GiB)
norm(r - A * u) = 5.766109468498555e-12
u = @btime lu(A)\r
@show norm(r-A*u);

  10.340 s (72 allocations: 2.39 GiB)
norm(r - A * u) = 4.227052608619862e-12

then with LinearSolve.jl

prob = LinearProblem(A,r);

sol = @btime LinearSolve.solve(prob, UMFPACKFactorization())
@show norm(r-A*sol.u);

  10.984 s (94 allocations: 2.71 GiB)
norm(r - A * sol.u) = 4.227052608619862e-12

then using precodintioners from AlgebraicMultigrid.jl with Krylov methods I get

m   = AlgebraicMultigrid.ruge_stuben(A)
p   = AlgebraicMultigrid.aspreconditioner(m)
sol = @btime LinearSolve.solve(prob, LinearSolve.KrylovKitJL_GMRES(), Pl=p)
@show norm(r-A*sol.u);

 51.581 s (603 allocations: 471.36 MiB)
norm(r - A * sol.u) = 7.902120848944424e-5


m   = AlgebraicMultigrid.smoothed_aggregation(A)
p   = AlgebraicMultigrid.aspreconditioner(m)
sol = @btime LinearSolve.solve(prob, LinearSolve.KrylovKitJL_GMRES(), Pl=p)
@show norm(r-A*sol.u);

 46.302 s (613 allocations: 477.56 MiB)
norm(r - A * sol.u) = 0.0001005812834686952

all of these have run on Jupyter-lab notebook with a single thread. Different runs of @btime differ of up to 1 sec.
Please do let me know if you have any question or need more information.

side note: lu and UMFPACKFactorization() terminate on the same solution:

sol = LinearSolve.solve(prob, UMFPACKFactorization())
u   = lu(A)\r
@show norm(u-sol.u);

norm(u - sol.u) = 0.0

What LinearSolve.jl is doing is equivalent to the lu(A)\b route, which will default to UMFPACK in Base Julia, but I did not know that there was a specialization of \ that was faster. Interesting, I wonder what shortcut it takes. @Wimmerer could you take a look at this?


this is what I get from @which A\r :


I’ll take a look

Unfortunately that’s the dispatch method, which doesn’t give much info, you said A is Hermitian?

Can you send me the JLD2 file? If you need to send it privately you can transmit it to somehow.

It looks like it’s calling CHOLMOD, by dispatching on ishermitian. Should we duplicate this codepath @ChrisRackauckas? I’m not even sure if we wrap CHOLMOD very well in LinearSolve.jl yet. julia-1.7.2+0~x64/share/julia/stdlib/v1.7/SuiteSparse/src/cholmod.jl:1573. I am working on familiarizing myself with the CHOLMOD codebase anyway right now.

1 Like