# SVD solve with CUSOLVER

Hi,

I am trying to do eigen decomposition using CUSOLVER. But it is slower than using CPU. My gpu is GeForce 1060 maybe simply just it is not good enough? I think I am not doing it right hope someone can help me. Thank you very much Here is the code

``````using CuArrays, CUDAnative, LinearAlgebra
# prepare matrix to decomp
matrix = rand(Float32, 100, 100)
matrix = matrix + matrix'
matrix_d = cu(matrix)
# cpu version
function cpu_eigen(mat)
eigen(mat)
end
# CUSOLVER
function gpu_eigen(mat)
CuArrays.CUSOLVER.syevd!('V','U', mat)
end
# time
@time cpu_eigen(matrix)
@time cpu_eigen(matrix) # 0.0039s
@time gpu_eigen(matrix_d)
@time gpu_eigen(matrix_d) #0.024s``````

I donâ€™t think youâ€™ll benefit from using a GPU on a matrix of size `(100,100)`.
These are the timings on my GTX 970 (the `yscale`-argument messes with the alignment of the bars on the bottom,sorry):

Perhaps you would rather look at the relative performance-gain:

Try increasing the size of the matrix.

You may also want to consider calling `LinearAlgebra.LAPACK.syev!` for the CPU-version for a more direct comparison, since `eigen` uses `LinearAlgebra.LAPACK.geevx!`.

My benchmark
``````using CuArrays, CUDAnative, LinearAlgebra
using Plots, StatsPlots

# cpu version
function cpu_eigen(mat)
eigen(mat) # Uses gesvx!, not syev!
end

# CUSOLVER
function gpu_eigen(mat)
CuArrays.CUSOLVER.syevd!('V','U', mat)
end

function timing(mat)
matrix = mat + mat'
matrix_d = cu(matrix)
t1 = @elapsed cpu_eigen(matrix)
t2 = @elapsed gpu_eigen(matrix_d)
return t1, t2
end

function main(datatype=Float32,N=100)
matrix = rand(datatype, N, N)

tcpu,tgpu = timing(matrix)
end

main(N) = main(Float32,N)

Nrange = 2 .^ (0:12)
Nl = length(Nrange)
data = main.(Nrange) # Collect benchmarks

# Rearrange into array for plotting
arraydata = [[data[i][1] for i in 1:Nl] [data[i][2] for i in 1:Nl]]

# Plot timings
pyplot()
exponents = repeat(0:12, outer = 2)
group = repeat(["CPU", "GPU"],inner=Nl)
timingplot = StatsPlots.groupedbar(exponents,arraydata,group=group,title="Timing eigen on CPU and GPU",xlabel="logâ‚‚(N)", ylabel="Elapsed time (s)",yscale= :log10,ylims=(2.0e-6,100.0),bar_width=0.5)
savefig(timingplot,"timing.png")

# Plot speedups
speedupplot = plot(-1:13,repeat([1],15))
plot!(speedupplot,0:12,arraydata[:,1] ./ arraydata[:,2], linetype=:bar,title="GPU-speedup over CPU",xlabel="logâ‚‚(N)",xlims=(-0.5,12.5),ylabel="Speedup", yscale= :log10, bar_width=0.25, legend=false)
savefig(speedupplot,"speedup.png")
``````
2 Likes

Thank you very much. You are right and I am getting the same speedup as the matrix size increase.

My original problem was trying to optimize eigen decomposition on matrix of ~60% 0s. The cpu eigen runs faster than gpu svd while cpu svd as you suggested is the slowest.

``````using CuArrays, CUDAnative, SparseArrays, LinearAlgebra
# prepare matrix
mat = sprand(3500, 3500, 0.2)
mat = Array(mat + mat')
mat_d = cu(mat)
# functions
function gpu_eigen(mat)
CuArrays.CUSOLVER.syevd!('V','U', mat)
end

function cpu_eigen(mat)
eigen(mat)
end

function cpu_svd(mat)
svd(mat)
end
# time; cpu are runs at 12 threads
@time cpu_eigen(mat) # 5s
@time gpu_eigen(mat_d) # 12s
@time cpu_svd(mat) # 20s
``````

I do get significant speed up from gpu as the matrix size increase. But I am mainly focus on matrix around 3500*3500 of 40% density. Do I have better options from gpu? Perhaps something like gpu eigen equivalent?