Cannot solve Ax=B on GPU with A CuSparseMatrixCSC

Ludovic_Dumoulin · May 11, 2021, 4:03pm

Hello,

I am trying to solve a A*x=B matrix equation with A being the discretization of a differential operator.
A is a CuSparseMatrixCSC{Float64}, B and x are CuArray{Float64,1}.
I write the following line

x .= A \ B

and get the following warning:

Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with allowscalar(false)

, and error:

MethodError: no method matching lu!(::CuSparseMatrixCSC{Float64}, ::Val{true}; check=true)
Closest candidates are:
lu!(!Matched::StridedArray{T, 2}, ::Union{Val{true}, Val{false}}; check) where T<:Union{Complex{Float32}, Complex{Float64}, Float32, Float64} at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LinearAlgebra\src\lu.jl:79

Stacktrace:
[1] lu(::CuSparseMatrixCSC{Float64}, ::Val{true}; check::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LinearAlgebra\src\lu.jl:273
[2] lu(::CuSparseMatrixCSC{Float64}, ::Val{true}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LinearAlgebra\src\lu.jl:272 (repeats 2 times)
[3] (::CuSparseMatrixCSC{Float64}, ::CuArray{Float64,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LinearAlgebra\src\generic.jl:1116

For this test A is a small matrix, but once it works A will be a 5324800×5324800 CuSparseMatrixCSC{Float64} with 47806720 stored entries:. Also, I have to do this operation at each time step (500’000) because B=B(t).

Thank you in advance for your help,

Ludo

Ludovic_Dumoulin · May 11, 2021, 4:04pm

Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

roble · May 11, 2021, 10:32pm

At first glance, I would say the .= is wrong. The dot operator is used for broadcasting which is not necessary to save the resulting vector in x.

Unfortunately, I don’t know much about Cuda. So I can’t help you there.

zhiwei · May 11, 2021, 11:06pm

Try using CSR matrix and csrlsvqr!. Here’s an example:

using SparseArrays
using LinearAlgebra
using CUDA 

n = 100
a = sprand(n,n,0.2) + sparse(I, n,n)
A = CUSPARSE.CuSparseMatrixCSR(a)
b = CUDA.rand(Float64, n)
x = CuVector{Float64}(undef,n)

tol = 1e-8
CUSOLVER.csrlsvqr!(A, b, x, tol, one(Cint),'O')

See https://github.com/JuliaGPU/CUDA.jl/blob/master/test/cusolver/sparse.jl

Ludovic_Dumoulin · May 12, 2021, 9:09am

Thank you,

I tried to apply your advice

and I got

LinearAlgebra.SingularException(29)

I just realized that my matrix A is not invertible

Ludovic_Dumoulin · May 14, 2021, 12:56pm

After some changes in A it works well.
However it is very slow compare to a naive iterative scheme.

As I have to use CUSOLVER.csrlsvqr!(A, b, x, tol, one(Cint),'O') many times (500’000) I would like to know if I can first compute the QR (or LU) decomposition and then use the solver ?
Or maybe there is a way to compute only once the decomposition of A and to keep it in memory?

zhiwei · May 19, 2021, 1:15am

I’m not sure how to do sparse factorization on the GPU directly, maybe someone else could help?

If you’d like to use iterative solvers, there are a few packages that implement Krylov-type methods. Also, maybe you could do the factorization on the CPU and copy arrays to GPU, I haven’t tried this so might not work.

rveltz · May 19, 2021, 8:27am

This is what I do. You have an example here:
https://rveltz.github.io/BifurcationKit.jl/dev/tutorialsCGL/#Continuation-of-periodic-orbits-on-the-GPU-(Advanced)-1

Ludovic_Dumoulin · May 20, 2021, 8:22am

Thank you,
So I tried this method, it is working well !

However, I think my matrix is to large to be solved by direct method. There are 80 times more elements in my L matrix than in A, and approximately the same ratio in U. (I’m using an incomplete LU decomposition, with 1e-7 tolerance. Then I have the same precision than a naive Jacobi iterative scheme. As my system is chaotic, I was looking for a better precision.

Ludovic_Dumoulin · May 20, 2021, 8:22am

Thank you, I will have a look.

Topic		Replies	Views
Solves the linear system using CuArrays.jl GPU	3	1611	December 27, 2019
(fast) sparse solve on GPU General Usage question , gpu	7	1685	December 28, 2019
Example of direct linear solve, Sparse Matrix in cuda General Usage gpu , linearalgebra	2	1087	March 18, 2019
Solving sparse matrix using backslash, got MethodError: no method matching lu! Numerics linearalgebra , sparse	3	1036	July 11, 2020
Sparse GPU linear solve from documentation fails Numerics	2	134	June 17, 2025

Cannot solve Ax=B on GPU with A CuSparseMatrixCSC

Related topics