I have an access to a large shared-memory machine. It has memory in the order of TB, and has 64 CPU cores.
If I want to perform A\b for a square matrix A and column vector b on this shared-memory machine using multiple CPU cores, what is the workflow? A is stored in Julia’s sparse matrix format.
You can also try out https://github.com/JuliaSparse/Pardiso.jl/. I’ve gotten significantly higher performance with it than with the solvers coming with julia on high core machines.
To clarify what is only really implied here, and please correct me if I’m wrong: linear algebra ops are automatically parallel if you have BLAS set up right (run versioninfo() and look for libopenblas). BLAS operations use a different kind of parallel computing than Julia native parallel as described in the parallel doc
@kristoffer.carlsson, I am testing Pardiso.jl, but the performance enhancement with the number of CPU cores is a bit disappointing. The same problem is solved in about 40 seconds with 2 cores, and about 30 seconds with 16 cores. Is this typical, or you think something is wrong?
Also, do you have an example with set_solver!(ps, ITERATIVE_SOLVER)? I am solving a system with slowly evolving matrix, so I would like to perform factorization only once by using ITERATIVE_SOLVER, but not sure where to call this. Don’t see a good example in the PARDISO documentation, either.