I thought that BLAS.set_num_threads(n) was enough to run all LinearAlgebra functions in a multi-threaded fashion. However, I came to realise that this actually applies mostly to matrix products. Inverses seem to be excluded from it - I suppose because they are based on LAPACK.
I was wondering if there’s a way to compute inverses in a multi-threaded fashion either within LinearAlgebra or via external dependencies. Right now, I would prefer using CPU rather than GPU.
I was also curious to know if there are caps in terms of the number of threads than LinearAlgebra can support. As highlighted in a different post, I have also noticed that running BLAS.set_num_threads(64) does not actually imply that 64 threads will be used or initialised (in my experiment the cluster set 32 threads and it used about 10% of the total CPU - comprising 64 cores).
using Random
using LinearAlgebra
using BenchmarkTools
Random.seed!(42)
A = rand(2000, 2000)
b = rand(2000)
BLAS.set_num_threads(1)
@btime $A \ $b
BLAS.set_num_threads(6)
@btime $A \ $b
BLAS.set_num_threads(1)
@btime inv($A)
BLAS.set_num_threads(6)
@btime inv($A)
shows speedup
133.715 ms (4 allocations: 30.55 MiB) # A \ b, 1 thread
43.437 ms (4 allocations: 30.55 MiB) # A \ b, 6 threads
420.827 ms (5 allocations: 31.51 MiB) # inv(A), 1 thread
173.065 ms (5 allocations: 31.51 MiB) # inv(A), 6 threads
Inverses aren’t excluded, because they’re based on BLAS/LAPACK (just as matmuls).
Yes, IIRC the max number of BLAS threads is set to 32 by Julia. This has changed in 1.8 where you can get, I believe, up to 512 BLAS threads. See BLAS performance testing for Julia 1.8
Don’t get me wrong, I also understand his concern and personally think that this is a footgun. I’m just answering the questions
(BTW, I think that defaulting to something else than 1 for ‘OPENBLAS_NUM_THREADS’ when running Julia in multithreaded mode is a footgun as well…)
Don’t know but I’d guess no. The thing is that patch releases are generally for fixing bugs. Question is whether we can (deliberately) label this as a bug and then fix it. But even if we would, it might not be easy to technically fix this in the LTS because it involves OpenBLAS(_jll.jl), i.e. an external dependency. Others are more qualified to speak to this. Perhaps filling an issue and at least discussing this would make sense.