Multi-threaded inverse of a matrix

Hi there,

I thought that BLAS.set_num_threads(n) was enough to run all LinearAlgebra functions in a multi-threaded fashion. However, I came to realise that this actually applies mostly to matrix products. Inverses seem to be excluded from it - I suppose because they are based on LAPACK.

I was wondering if there’s a way to compute inverses in a multi-threaded fashion either within LinearAlgebra or via external dependencies. Right now, I would prefer using CPU rather than GPU.

I was also curious to know if there are caps in terms of the number of threads than LinearAlgebra can support. As highlighted in a different post, I have also noticed that running BLAS.set_num_threads(64) does not actually imply that 64 threads will be used or initialised (in my experiment the cluster set 32 threads and it used about 10% of the total CPU - comprising 64 cores).

Why do you think that? My test

using Random
using LinearAlgebra
using BenchmarkTools


A = rand(2000, 2000)
b = rand(2000)

@btime $A \ $b

@btime $A \ $b

@btime inv($A)

@btime inv($A)

shows speedup

  133.715 ms (4 allocations: 30.55 MiB) # A \ b, 1 thread 
  43.437 ms (4 allocations: 30.55 MiB) # A \ b, 6 threads
  420.827 ms (5 allocations: 31.51 MiB) # inv(A), 1 thread
  173.065 ms (5 allocations: 31.51 MiB) # inv(A), 6 threads
1 Like

Inverses aren’t excluded, because they’re based on BLAS/LAPACK (just as matmuls).

Yes, IIRC the max number of BLAS threads is set to 32 by Julia. This has changed in 1.8 where you can get, I believe, up to 512 BLAS threads. See BLAS performance testing for Julia 1.8

1 Like

Thanks, that must be it then. Is there any way for increasing this number in the LTS version?

No, I don’t think so (apart from changing the number in the source code and compiling from source).

(@viralbshah or do you know a trick here ?)

I do understand @fipelle’s concern. In other words, is there any reason not to change this in an upcoming release of LTS?

Don’t get me wrong, I also understand his concern and personally think that this is a footgun. I’m just answering the questions :slight_smile:

(BTW, I think that defaulting to something else than 1 for ‘OPENBLAS_NUM_THREADS’ when running Julia in multithreaded mode is a footgun as well…)

Don’t know but I’d guess no. The thing is that patch releases are generally for fixing bugs. Question is whether we can (deliberately) label this as a bug and then fix it. But even if we would, it might not be easy to technically fix this in the LTS because it involves OpenBLAS(_jll.jl), i.e. an external dependency. Others are more qualified to speak to this. Perhaps filling an issue and at least discussing this would make sense.