I’m trying to optimize the eigenvalue (and eigenvector) calculation for a neutrino oscillation package which is currently under development (https://github.com/KM3NeT/Neurthino.jl).
The problem consists of a lot of small symmetric matrices where the eigenvalues should be determined and I thought maybe I can utilize the GPU for that problem. In order to pass this as a single problem to the GPU via a sparse matrix having all the small matrices on its trace.
For testing I created this matrix:
A_full = Symmetric(rand(3000, 3000))
When doing it in the “classic” way I get:
@benchmark eigs(A_full)
BenchmarkTools.Trial: 
  memory estimate:  808.38 KiB
  allocs estimate:  2571
  --------------
  minimum time:     519.974 ms (0.00% GC)
  median time:      527.320 ms (0.00% GC)
  mean time:        528.424 ms (0.00% GC)
  maximum time:     546.705 ms (0.00% GC)
  --------------
  samples:          10
  evals/sample:     1
I also refined the example with respect to my initial problem description and created this example:
dim = 3
n = 1000
A = spzeros(dim*n, dim*n)
for i in 1:n
     A[(i-1)*dim+1:i*dim, (i-1)*dim+1:i*dim] = Symmetric(rand(dim, dim))
end
and when calculating the eigenvalues using Arpack.jl I get:
@benchmark eigs(A)
BenchmarkTools.Trial: 
  memory estimate:  810.03 KiB
  allocs estimate:  1889
  --------------
  minimum time:     24.614 ms (0.00% GC)
  median time:      27.885 ms (0.00% GC)
  mean time:        28.766 ms (0.12% GC)
  maximum time:     47.153 ms (0.00% GC)
  --------------
  samples:          174
  evals/sample:     1
Using the CUDA package I get:
A_cuda = CuArray{ComplexF64}(A)
@benchmark CUDA.@sync CUDA.CUSOLVER.heevd!('V','U', A_cuda)
BenchmarkTools.Trial: 
  memory estimate:  1.23 KiB
  allocs estimate:  33
  --------------
  minimum time:     1.768 s (0.00% GC)
  median time:      1.769 s (0.00% GC)
  mean time:        1.770 s (0.00% GC)
  maximum time:     1.773 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1
As this is already slower than the first example it leads me to the question whether I have to change my approach?
If I stick to my approach I guess I have to change it to CUDA.CUSPARSE.CuSparseMatrixCSR(A) and use specific eigenvalue functionality for sparse arrays, because I have seen there is a function CUDA.CUSOLVER.csreigs, but I don’t really get how to use it (with respect to its arguments).
Btw. all the testing was done using a NVIDIA GTX 1080 Ti.
Can you help me out at that point?
Thanks.