Solving A\b in parallel

I have an access to a large shared-memory machine. It has memory in the order of TB, and has 64 CPU cores.

If I want to perform A\b for a square matrix A and column vector b on this shared-memory machine using multiple CPU cores, what is the workflow? A is stored in Julia’s sparse matrix format.

Did A\b not work? What kind of matrix is A? How sparse is it? Is it symmetric positive definite? Non-symmetric?

A\b works, but how do we make sure it runs on N CPU cores?

Check htop. You’ll see that it’s already multithreaded.

So you mean I don’t need to do something like julia -np N scriptname.jl to make A\b in scriptname.jl use N CPU cores?

In fact, don’t do that. Adding processes sets the number of BLAS threads to 1, which is what you don’t want.

You can also try out https://github.com/JuliaSparse/Pardiso.jl/. I’ve gotten significantly higher performance with it than with the solvers coming with julia on high core machines.

3 Likes

To clarify what is only really implied here, and please correct me if I’m wrong: linear algebra ops are automatically parallel if you have BLAS set up right (run versioninfo() and look for libopenblas). BLAS operations use a different kind of parallel computing than Julia native parallel as described in the parallel doc

1 Like

Solving a sparse linear system will not use BLAS directly, although it is likely used by the sparse solver.

1 Like

@kristoffer.carlsson, I am testing Pardiso.jl, but the performance enhancement with the number of CPU cores is a bit disappointing. The same problem is solved in about 40 seconds with 2 cores, and about 30 seconds with 16 cores. Is this typical, or you think something is wrong?

How big is the matrix?

It is million by million.

Also, do you have an example with set_solver!(ps, ITERATIVE_SOLVER)? I am solving a system with slowly evolving matrix, so I would like to perform factorization only once by using ITERATIVE_SOLVER, but not sure where to call this. Don’t see a good example in the PARDISO documentation, either.

So, let’s suppose I am using multiprocessing and solving sparse matrix systems. Do I have to set manually the number of BLAS threads for each process?

Did you find a way?
I also want to use Pardiso.jl in the same manner. Factorize once, use many times (For different RHS).

I thought iterative mode means the way the system is solved not use it many times.