I am experimenting with Julia’s (experimental) multithreading feature recently and like the results so far. One of the problems I need to deal with, involves the multiplication of several pairs (say in the order 5 to 50) of matrices, whose size is average (say linear size in the order 10 - 1000). For that problem, there can be a competition between either using Julia threads (to loop over the different pairs) versus using multithreaded matrix multiplication provided by BLAS, and it will depend on the specific problem case (number of pairs and size of the matrices involved) which is most advantageous.

However, it seems that using `BLAS.set_num_threads(n)`

is

- very inflexible (e.g. how to obtain the current or default number of threads?)
- very slow (order 350 microseconds, which is much more than the time required to e.g. multiply 100 x 100 matrices).

So it’s not a feasible solution to just modify the number of BLAS threads depending on a quick analysis of the specific case, as that operation itself would take all the time.

As an alternative, I started experimenting with just using `BLAS.set_num_threads(1)`

in the beginning of my script/module, and then using my own multithreaded matrix multiplication

```
function mymul!(C,A,B)
(m,n) = size(C)
mhalf = m>>1
nhalf = n>>1
mrange = (1:mhalf,1:mhalf,(mhalf+1):m,(mhalf+1):m)
nrange = (1:nhalf,(nhalf+1):n,1:nhalf,(nhalf+1):n)
Threads.@threads for i = 1:4
mul!(view(C,mrange[i],nrange[i]),view(A,mrange[i],:),view(B,:,nrange[i]))
end
return C
end
```

(here specifically for square matrices and 4 threads, but a slightly more generic strategy can easily be written)

This seems to work surprisingly well, i.e. there is no noticeable difference with the multithreading provided by BLAS. But the advantage is that, if `mymul!`

is called from withing a threaded loop, then it will in itself automatically run single-threaded.

So my question is whether this is something that will need to be considered in Julia Base / `LinearAlgebra`

as the multithreaded features of julia become more established, or whether there are alternative solutions?