The Arpack
eigenvalues solver is very fast. With a 100000x100000 sparse matrix, I can get a few eigenvalues with the shift-invert method in 20-30 seconds approximately!
Since I always want the best performances, I wondered if it was possible to calculate a limited number of eigenvalues through the CUDA.jl
package.
As an exercise I speeded up the CPU algorithm, using the KrylovKit.jl
package. Here the minimal code
using LinearAlgebra, KrylovKit, IncompleteLU, CUDA
CUDA.allowscalar(false)
# H is an Hermitian matrix. It is often positive-definite.
my_eltype = eltype(H)
n = size(H, 1)
sigma_shift = -2 # the shift for the shift-inverse method
posdef_shift = 5 # perhaps a meaningless parameter.
# It should shift the total matrix to be sure it is positive definite
k = 8 # Number of eigenvalues to return
v0 = rand(my_eltype, n)
v0 /= norm(v0)
A_cpu = H
A_shifted_cpu = A_cpu + (posdef_shift - sigma_shift) * I
P_exact_cpu = factorize(A_shifted_cpu)
@time vals, vecs, info = eigsolve(x -> P_exact_cpu \ x, v0, k, ishermitian = true)
vals = (1 .+ (sigma_shift - posdef_shift) * vals) ./ vals
vals[1:k]
or using IterativeSolvers.jl
P_cpu = ilu(A_shifted_cpu, τ = 0.01)
@time vals, vecs, info = eigsolve(x -> cg(A_shifted_cpu, x; Pl = P_cpu, maxiter = 2000), v0, k, ishermitian = true)
vals = (1 .+ (sigma_shift - posdef_shift) * vals) ./ vals
vals[1:k]
Which are a little bit faster than the Arpack.jl
method.
Taking advantage of the versatility of the CUDA.jl
, IterativeSolvers.jl
and KrylovKit.jl
packages, I implemented this GPU version
function LinearAlgebra.ldiv!(y::CuArray, P::CUSPARSE.CuSparseMatrixCSC, x::CuArray)
copyto!(y, x) # Variant for CuSparseMatrixCSR
CUSPARSE.sv2!('T', 'U', 'N', 1.0, P, y, 'O') # sv2!('N', 'L', 'N', 1.0, P, y, 'O')
CUSPARSE.sv2!('N', 'U', 'N', 1.0, P, y, 'O') # sv2!('T', 'L', 'N', 1.0, P, y, 'O')
return y
end
function LinearAlgebra.ldiv!(P::CUSPARSE.CuSparseMatrixCSC, x::CuArray)
CUSPARSE.sv2!('T', 'U', 'N', 1.0, P, x, 'O') # sv2!('N', 'L', 'N', 1.0, P, y, 'O')
CUSPARSE.sv2!('N', 'U', 'N', 1.0, P, x, 'O') # sv2!('T', 'L', 'N', 1.0, P, y, 'O')
return x
end
v0 = CUDA.rand(my_eltype, n)
v0 /= norm(v0)
A_shifted_gpu = CUSPARSE.CuSparseMatrixCSC(A_shifted_cpu)
P_gpu = CUSPARSE.ic02(A_shifted_gpu, 'O')
@time vals, vecs, info = eigsolve(x -> cg(A_shifted_gpu, x; Pl = P_gpu, maxiter = 2000), v0, k, ishermitian = true)
vals = (1 .+ (sigma - posdef_shift) * vals) ./ vals
vals[1:k]
However this method is slower compared to the GPU version, and is expensive in terms of GPU memory.
I think that the bottleneck is in the linear solve method, since
v0 = rand(my_eltype, n)
v0 /= norm(v0)
v0_gpu = CuArray(v0)
@time test1 = P_exact_cpu \ v0
println(test1[1])
println("")
@time test2 = cg(A_shifted_cpu, v0; Pl = P_cpu, maxiter = 2000)
println(test2[1])
println("")
@time test3 = cg(A_shifted_gpu, v0_gpu; Pl = P_gpu, maxiter = 2000)
println(Array(test3)[1])
println("")
returns
"""
0.000764 seconds (12 allocations: 127.094 KiB)
1.966588992514033e-6 + 1.98235638546855e-6im
0.001486 seconds (18 allocations: 126.375 KiB)
1.9665889918161988e-6 + 1.982356384764342e-6im
0.089783 seconds (5.69 k allocations: 244.609 KiB)
1.9665889861684783e-6 + 1.9823563706705657e-6im
"""
I think that solving eigenvalues problems for large matrices with GPUs is very useful, and I hope o succeed one day.