Solves the linear system using CuArrays.jl

I am new to GPU parallel computing. With my knowledge of CUSOLVER and CUSPARSE, I am sure that I can complete my task through them: Solving large linear sparse equations in parallel.

I have read the documentation of CUDA.jl, part of the code of CuArray.jl (a bit difficult for me :sweat_smile:), and the official manual of CUDA: CUSOLVER LIBRARY. The following part is my code:

# A * x = b

n = 10

A = sprand(Float32, n, n, 0.5)
A = sparse(A*A')
d_A = CuArrays.CUSPARSE.CuSparseMatrixCSR(A)

b = rand(Float32, n)
d_b = CuArray(b)

x = zeros(Float32, n)
d_x = CuArray(x)

tol = convert(real(Float32), 1e-4)
d_x = CUSOLVER.csrlsvqr!(d_A, d_b, d_x, tol, one(Cint), 'O')
h_x = collect(d_x)

h_x ≈ Array(A)\b

> true

The result returned by the code is true.

But as the value of n increases (e.g. n = 1000), the results are always false. I would like to ask, why are the calculation results different? :grinning:

Could it be due to bad conditioning in the matrix?

Thank you for your reply. :grin:

I want to use the random sparse matrix generated in the above way to solve it in parallel on the GPU. It is only a test, and I will use the actual problem to observe its calculation results.

Besides, I’m not sure if the parameters (such as tol, etc.) in the function CUSOLVER.csrlsvqr!(...) have any effect on the calculation results? Or how should I choose more suitable parameters based on the actual problem?

there is a “big” proba that A is non invertible. Try to invert I + A.

1 Like