I am new to GPU parallel computing. With my knowledge of CUSOLVER and CUSPARSE, I am sure that I can complete my task through them: Solving large linear sparse equations in parallel.
I have read the documentation of CUDA.jl
, part of the code of CuArray.jl
(a bit difficult for me ), and the official manual of CUDA: CUSOLVER LIBRARY
. The following part is my code:
# A * x = b
n = 10
A = sprand(Float32, n, n, 0.5)
A = sparse(A*A')
d_A = CuArrays.CUSPARSE.CuSparseMatrixCSR(A)
b = rand(Float32, n)
d_b = CuArray(b)
x = zeros(Float32, n)
d_x = CuArray(x)
tol = convert(real(Float32), 1e-4)
d_x = CUSOLVER.csrlsvqr!(d_A, d_b, d_x, tol, one(Cint), 'O')
h_x = collect(d_x)
h_x ≈ Array(A)\b
> true
The result returned by the code is true
.
But as the value of n
increases (e.g. n = 1000
), the results are always false
. I would like to ask, why are the calculation results different?