Sparse Cholesky factorization with multiple threads

Hi. I’m relatively new to Julia and I am trying to use multiple threads to speed up the sparse Cholesky factorization from SuiteSparse. I have a total of 32 physical cores available on the machine I’m using.

When using BLAS you can easily modify the number of threads you use with BLAS.set_num_threads() and I get nice speedups when I compare for example a dense matrix-matrix multiply using 32 threads vs. 16 threads. However, I can’t seem to get any (noticable) speed-up using for the sparse Cholesky factorization. Can you somehow set the number of threads that SuiteSparse uses as you can for BLAS? Or does SuiteSparse automatically fix the number of threads? What would be the most suitable way for me to try to achieve some speed-up?

Side question: the number of BLAS threads I can use seems to be capped at 32, but using hyper-threading I should be able to use 64. So I wonder, why is this value capped at 32?



The answer to your last question is that more threads than chores isn’t worth it for BLAS because you will run it of mentors bandwidth before you run out of cores.