I am trying to optimise a function that solves a series of high-dimensional systems (using A\b) in parallel (one per iteration of a for-loop). I am not sure whether it would be a good idea to have more cores than what I need as a buffer for these linear algebra operations (ergo, if they are automatically running in parallel and thus if I need to account for that). Would you please help me out understanding this point? I am open to alternative solutions, but it is crucial for me to solve the systems at the same time.

I am currently using Julia LTS, 16 CPU cores, and 16 GB RAM per core. The A matrix is 1,000 x 1,000 dimensional, while b is 1,000 x K where K << 1000. I was wondering whether it would be a good idea to increase the number of CPU cores and keep the total RAM as it is.

There are 2 parallel capabilities: threads and processes and you chose multiprocessing. For this, if you use SharedArrays you should get parallel computations if implemented internally. For threads, regular Arrays are enough and the undelying library (openblas+lapack?) should already use multi threads. For large systems with sparse A, b, you could use SparseArrays.

I guess on LTS threads arenâ€™t really an option? Any reason why youâ€™re using LTS, @fipelle? Itâ€™s quite out of date now and youâ€™re missing out on loads of great improvements to the language!

Note that Julia 1.6 will probably be the next LTS, and it will be out within a few months. One solution would be to just wait until the LTS version does this automatically.

@tomaklutfu: A has quite a lot of zeros, so I suppose that I could consider it a SparseArrays. Do I need to have a sparse b as well to see the benefits?

@nilshg: Frankly, I had to dedicate less time to Julia in the last semester due to Covid-19 related issues. In fact, TSAnalysis.jl is still waiting for an major update (I am writing a theoretical paper that could have an impact on it and I am finishing that first). The only software I managed to release is a replication code for a paper I co-authored that has just been accepted for publication at the Review of Economics and Statistics (not a proper package, but an interesting empirical application / technique).

@Oscar_Smith: I am not sure I understand. Is Julia 1.6 going to have automated multithreading for linear algebra operations?

On the multithreading: in many cases, linear algebra is already multithreaded, as it calls out to underlying BLAS (or MKL when using MKL.jl) routines. Julia 1.3 introduced composable multithreaded parallelism (announcement here, docs here), introducing the @threads macro that can be prefixed to for loops.

While multithreading has continued to improve since 1.3 and is likely going to continue improving, it is unlikely that there will be â€śautomatedâ€ť parallelism (other than what underlying libraries like BLAS do)

Thank you! Do you also know if there exists a good performance comparison between classical Julia multitasking, multithreading and a combination of the two?

Not aware of a definite comparison unfortunately, thereâ€™s a thread here: The ultimate guide to distributed computing which discusses options in some detail, and there are some good answers from Bogumil and Przemyslaw on StackOverflow that might be helpful.