I’m trying to optimize the eigenvalue (and eigenvector) calculation for a neutrino oscillation package which is currently under development (https://github.com/KM3NeT/Neurthino.jl).
The problem consists of a lot of small symmetric matrices where the eigenvalues should be determined and I thought maybe I can utilize the GPU for that problem. In order to pass this as a single problem to the GPU via a sparse matrix having all the small matrices on its trace.
For testing I created this matrix:
A_full = Symmetric(rand(3000, 3000))
When doing it in the “classic” way I get:
@benchmark eigs(A_full)
BenchmarkTools.Trial:
memory estimate: 808.38 KiB
allocs estimate: 2571
--------------
minimum time: 519.974 ms (0.00% GC)
median time: 527.320 ms (0.00% GC)
mean time: 528.424 ms (0.00% GC)
maximum time: 546.705 ms (0.00% GC)
--------------
samples: 10
evals/sample: 1
I also refined the example with respect to my initial problem description and created this example:
dim = 3
n = 1000
A = spzeros(dim*n, dim*n)
for i in 1:n
A[(i-1)*dim+1:i*dim, (i-1)*dim+1:i*dim] = Symmetric(rand(dim, dim))
end
and when calculating the eigenvalues using Arpack.jl
I get:
@benchmark eigs(A)
BenchmarkTools.Trial:
memory estimate: 810.03 KiB
allocs estimate: 1889
--------------
minimum time: 24.614 ms (0.00% GC)
median time: 27.885 ms (0.00% GC)
mean time: 28.766 ms (0.12% GC)
maximum time: 47.153 ms (0.00% GC)
--------------
samples: 174
evals/sample: 1
Using the CUDA package I get:
A_cuda = CuArray{ComplexF64}(A)
@benchmark CUDA.@sync CUDA.CUSOLVER.heevd!('V','U', A_cuda)
BenchmarkTools.Trial:
memory estimate: 1.23 KiB
allocs estimate: 33
--------------
minimum time: 1.768 s (0.00% GC)
median time: 1.769 s (0.00% GC)
mean time: 1.770 s (0.00% GC)
maximum time: 1.773 s (0.00% GC)
--------------
samples: 3
evals/sample: 1
As this is already slower than the first example it leads me to the question whether I have to change my approach?
If I stick to my approach I guess I have to change it to CUDA.CUSPARSE.CuSparseMatrixCSR(A)
and use specific eigenvalue functionality for sparse arrays, because I have seen there is a function CUDA.CUSOLVER.csreigs
, but I don’t really get how to use it (with respect to its arguments).
Btw. all the testing was done using a NVIDIA GTX 1080 Ti.
Can you help me out at that point?
Thanks.