How to solve this Ax=b faster?

Side note: Don’t do this. It allocates an array - no! two arrays - needlessly. It is of course not an effect of any significance here, but it’s a good idea to mind unnecessary allocations in general.

Instead, do (1:331060) .* 6 or 6:6:(331060*6).

5 Likes

Thank you! I’ll check this out.

Thanks. Noted.

Could you explain this a little more? I thought that sparse systems have separate solvers that act on them. Sparse Linear Algebra · The Julia Language
But I do face out of memory error while trying cholesky(A).

Sparse matrices encountered in many real-world problems have some structures, which makes it possible to do Gaussian elimination without turning the matrix into a dense matrix in intermediate steps (called “fill-in”). If your sparse matrix is completely random, then Gaussian elimination doesn’t work well, and you should switch to a package that implements iterative methods which are insensitive to the fill-in problem.

1 Like

you might want to look at this reply by @stevengj, I haven’t tried it yet myself, but he mentions two parallel solvers MUMPS.jl and Pardiso.jl that might be of help

Just use LinearSolve.jl

http://linearsolve.sciml.ai/dev/

And then run the gambit.

for alg in [KrylovJL_GMRES(), KLUFactorization(), KrylovKitJL_GMRES(), IterativeSolversJL_GMRES(), UMFPACKFactorization(), MKLPardisoFactorize(), MKLPardisoIterate()]
  sol = solve(prob,alg)
end

Likely MKLPardiso is best without a preconditioner. For the GMRES ones, you may want to define a preconditioner like:

http://linearsolve.sciml.ai/dev/basics/Preconditioners/

1 Like

It seems like Pardiso is not free for academic users anymore. Lets me sign up but received a mail saying academic license discontinued from Jan 2022.

Thank you. I was able to solve it with Krylov. But I’ll check this out as well!

1 Like

Learned an important point from you and @Jae-Mo_Lihm

If this problem is occuring as part of an optimization problem, I can recommend COSMO.jl

I tried LinearSolve on a 3D Finite Element Problem with 68’170 elements and 82’424 nodes of the type

\mathbf{A}\mathbf{u} = \mathbf{r}

where \mathbf{A} is a sparse SPD matrix with 162’372 rows with 10’889’190 non zero elements (on average 67 non zero entries per row). I tried the following

using using LinearSolve
prob = LinearProblem(A,r);

for alg in [KrylovJL_GMRES(), KLUFactorization(), KrylovKitJL_GMRES(), IterativeSolversJL_GMRES(), UMFPACKFactorization()]
  @time sol = LinearSolve.solve(prob,alg)
end

180.594400 seconds (2.56 M allocations: 2.356 GiB, 3.02% gc time, 0.45% compilation time)
217.641759 seconds (4.28 M allocations: 3.479 GiB, 0.43% gc time)
181.348235 seconds (244.91 k allocations: 2.233 GiB, 3.01% gc time, 0.05% compilation time)
954.111518 seconds (4.03 M allocations: 450.389 MiB, 0.05% gc time, 0.17% compilation time)
 13.303307 seconds (121.40 k allocations: 2.719 GiB, 5.24% gc time, 0.96% compilation time)

the MKL solvers gave and UndefVarError so I couldnt try them.
I also tried Krylov.jl, and got

using Krylov
@time x, stats = Krylov.cg(A,r)

 80.857184 seconds (1.28 M allocations: 77.709 MiB, 0.88% compilation time)

if now I compare these performances with the standard LinearAlgebra.jl package, this is what I get

@time A\r;

  3.062161 seconds (661 allocations: 1.390 GiB, 1.07% gc time, 0.13% compilation time)

it seems that the LinearAlgebra solver, on this problem, outperforms all the others, should that be the case or am I not getting the best speed from the other solvers for some reason? or is it the size of the problem that influences the performances.

(please, do repost this as a new thread if think it is appropriate)

1 Like

And it’s not just compile time? The LinearAlgebra default is just UMFPACKFactorization so it should be the same time. Do you have a good way for me to reproduce this?

For the Krylov methods, did you try adding any preconditioners? Algebraic multigrid is probably a good idea here.

The matrix \mathbf{A} is the result of a FE mesh with hexahedral elements of a solid mechanics problem, with 3 degrees of freedom (DoFs) per node. I have Dirichlet boundary conditions on the boundary and free internal nodes, thus the problem is constructed as follows

\mathbf{A} = \mathbf{Kt}[\textrm{iDoFs},\textrm{iDoFs}]\\ \mathbf{r} = \mathbf{Kt}[\textrm{iDoFs},\textrm{bDoFs}]\mathbf{s}[\textrm{bDoFs}]

where \mathbf{Kt} is the stiffness matrix of the entire domain, \textrm{bDoFs} are the indices of the boundary DoFs and \textrm{iDoFs} are the indices of the internal DoFs, \mathbf{s} is the entire DoFs vector. \mathbf{A} is Hermitian, with size(A)=(162372, 162372)

Depending on the position of the element, whether internal or on the boundary, each node can be surrounded from 1 up to 8 elements, which decides the number of of non zero entries per each row of the matrix, all diagonal entries are non zero. It is a quite structured matrix, which is difficult to obtain by random generation, if it helps can share the matrix and the vectors as JLD2 files.

This is what I tried

u = @btime  A\r
@show  norm(r-A*u);

  3.843 s (52 allocations: 1.39 GiB)
norm(r - A * u) = 5.766109468498555e-12
u = @btime lu(A)\r
@show norm(r-A*u);

  10.340 s (72 allocations: 2.39 GiB)
norm(r - A * u) = 4.227052608619862e-12

then with LinearSolve.jl

prob = LinearProblem(A,r);

sol = @btime LinearSolve.solve(prob, UMFPACKFactorization())
@show norm(r-A*sol.u);

  10.984 s (94 allocations: 2.71 GiB)
norm(r - A * sol.u) = 4.227052608619862e-12

then using precodintioners from AlgebraicMultigrid.jl with Krylov methods I get

m   = AlgebraicMultigrid.ruge_stuben(A)
p   = AlgebraicMultigrid.aspreconditioner(m)
sol = @btime LinearSolve.solve(prob, LinearSolve.KrylovKitJL_GMRES(), Pl=p)
@show norm(r-A*sol.u);

 51.581 s (603 allocations: 471.36 MiB)
norm(r - A * sol.u) = 7.902120848944424e-5

and

m   = AlgebraicMultigrid.smoothed_aggregation(A)
p   = AlgebraicMultigrid.aspreconditioner(m)
sol = @btime LinearSolve.solve(prob, LinearSolve.KrylovKitJL_GMRES(), Pl=p)
@show norm(r-A*sol.u);

 46.302 s (613 allocations: 477.56 MiB)
norm(r - A * sol.u) = 0.0001005812834686952

all of these have run on Jupyter-lab notebook with a single thread. Different runs of @btime differ of up to 1 sec.
Please do let me know if you have any question or need more information.

side note: lu and UMFPACKFactorization() terminate on the same solution:

sol = LinearSolve.solve(prob, UMFPACKFactorization())
u   = lu(A)\r
@show norm(u-sol.u);

norm(u - sol.u) = 0.0

What LinearSolve.jl is doing is equivalent to the lu(A)\b route, which will default to UMFPACK in Base Julia, but I did not know that there was a specialization of \ that was faster. Interesting, I wonder what shortcut it takes. @rayegun could you take a look at this?

3 Likes

this is what I get from @which A\r :

~/julia/julia-1.6.4/share/julia/stdlib/v1.6/SparseArrays/src/linalg.jl:1551

I’ll take a look

Unfortunately that’s the dispatch method, which doesn’t give much info, you said A is Hermitian?

Can you send me the JLD2 file? If you need to send it privately you can transmit it to kimmerer@mit.edu somehow.

It looks like it’s calling CHOLMOD, by dispatching on ishermitian. Should we duplicate this codepath @ChrisRackauckas? I’m not even sure if we wrap CHOLMOD very well in LinearSolve.jl yet. julia-1.7.2+0~x64/share/julia/stdlib/v1.7/SuiteSparse/src/cholmod.jl:1573. I am working on familiarizing myself with the CHOLMOD codebase anyway right now.

1 Like

Oh, that completely changes the game. Yes, Cholesky factorizations are going to a way faster than any other factorization if the matrix is SPD! I didn’t catch that detail (now that I re-read, there was a quick mention of SPD).

Yes, we should wrap CHOLMOD for this case. Also, for SPD you shouldn’t use GMRES, instead you should use CG (KrylovJL_CG).

2 Likes

here I uploaded a JLD2 file and a small script that runs the tests.

following the output of the script

loading variables ... 
A, r, 
size(A) = (162372, 162372)
ishermitian(A) = true
norm(r) = 9027.184755276294

doing u = A\r ...
  3.878 s (52 allocations: 1.39 GiB)
norm(r - A * u) = 5.766109468498555e-12

doing u = lu(A)\r ...
  13.797 s (72 allocations: 2.39 GiB)
norm(r - A * u) = 4.227052608619862e-12

doing LinearSolve.solve(prob, UMFPACKFactorization()) ...
  12.981 s (94 allocations: 2.71 GiB)
norm(r - A * sol.u) = 4.227052608619862e-12

doing LinearSolve.solve(prob, KrylovJL_CG()) ...
  70.360 s (50 allocations: 179.79 MiB)
norm(r - A * sol.u) = 0.00013389304531825536

you can tell KrylovJL_CG() is a conjugate gradient method by looking at how little memory it allocates, I would also have expected a CG method would have been faster on a SPD problem (probably the the matrix is not big enough to outperform a very well optimized direct method)

1 Like

@Andrea_Vigliotti, the convergence rate of CG hightly depends on the distribution of the eigenvalues of A.
If you have a good preconditioner, the Krylov method could only requires a few iterations.

2 Likes