Hi,

I am working with large sparse matrices (10^5 x 10^5). I would like to make full use of the cores / threads of my CPU to accelerate the multiplication process. As of now, I am using mul!:

```
using SparseArrays,LinearAlgebra
n = 10000
L = sprand(Float64,n,n,0.3)
u0 = sprand(Float64,n,0.3)
u1 = similar(u0)
mul!(u1,L,u0);
```

However, this only uses one thread.

Is there a package or a trick to parallelise the multiplication? I know I could always do something like:

```
function parallel_mul!(u1,L,u0)
Threads.@threads for i in 1:size(u1)[1]
for j in 1:size(u1)[2]
mul!(u1[i,j],L[i,:],u0[:,j])
end
end
end
```

But maybe there is something more straightforward? Else, may be it is worth making `parallel_mul!`

more robust and implement it in some package?