(fast) sparse solve on GPU

rveltz · December 15, 2019, 9:28am

Hi,

I would like to solve sparse linear systems on the GPU. But when I try the tests example of CuArrays, the GPU version seems way slower than the CPU one. Is it expected?

Additionnally, say I have a LU factorisation of a cpu matrix. Can I transfer this to the GPU so that \ is overloaded. I may be asking for too much but maybe somebody has already done it!

ChrisRackauckas · December 15, 2019, 9:37am

GPU sparse solves silently use the CPU for part of it IIRC, kind of like SVD.

You’ll want to upstream our fix in DiffEqBase: https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/init.jl#L146-L150

rveltz · December 15, 2019, 3:03pm

Interesting, I will try that!

rveltz · December 27, 2019, 5:24pm

Hi,

I tried the above solution but I have a bug I cannot find. It is about using \ for transpose sparse CuArrays. I perform an iLU decomposition and then try to use the decomposition to solve linear systems on the GPU:

using SparseArrays, LinearAlgebra, IncompleteLU
n = 1000
A = I + sprand(n,n,0.01)
Precilu = ilu(A, τ = 0.1)

Now I check that my formula for the inverse works:

rhs = rand(n)
sol_0 = Precilu \ rhs
sol_1 = Precilu.U' \ ((I+Precilu.L)  \ (rhs))
norm(sol_1-sol_0, Inf64)

It returns 0!! Hence, I have perform these solves in the GPU. The first one works well:

using CuArrays
CuArrays.allowscalar(false)
sol_0 = (I+Precilu.L) \ rhs
sol_1 = LowerTriangular(CuArrays.CUSPARSE.CuSparseMatrixCSR(I+Precilu.L)) \ CuArray(rhs)
norm(sol_0-Array(sol_1), Inf64)

and this returns the 4.492221705731936e-9. However, the following fails and I don’t understand why

sol_0 = (Precilu.U)' \ rhs
sol_1 = LowerTriangular(CuArrays.CUSPARSE.CuSparseMatrixCSR(sparse(Precilu.U'))) \ CuArray(rhs)
norm(sol_0-Array(sol_1), Inf64)

and this returns 1137.9857325312835. Maybe the fact that

rhs = rand(n)
	sol_0 = Precilu \ rhs
	sol_1 = LowerTriangular(Precilu.U') \ (LowerTriangular(I+Precilu.L)  \ (rhs))
	norm(sol_1-sol_0, Inf64)

does not return zero whereas this one does is key

rhs = rand(n)
sol_0 = Precilu \ rhs
sol_1 =(Precilu.U') \ (LowerTriangular(I+Precilu.L)  \ (rhs))
norm(sol_1-sol_0, Inf64)

mohamed82008 · December 27, 2019, 6:12pm

It seems that Precilu.U is not UpperTriangular but lower. So LowerTriangular(Precilu.U') in your code should be UpperTriangular(Precilu.U') which makes sense because the second linear system solve should be using the upper triangular matrix from the factorization. I don’t know why @stabbles went with that convention in the package but fixing the above fixes your problem.

rveltz · December 28, 2019, 7:52am

Wow, this was nicely spotted, thank you!

stabbles · December 28, 2019, 9:18pm

It’s a technical issue; you get the U factor row by row in crout ILU, but SparseMatrixCSC is created column by column.

mohamed82008 · December 28, 2019, 9:25pm

Might be worth returning a Transpose then, it’s a bit counter-intuitive that U is lower triangular

Topic		Replies	Views
Cannot solve Ax=B on GPU with A CuSparseMatrixCSC New to Julia cuda , sparse	9	1103	May 20, 2021
Solves the linear system using CuArrays.jl GPU	3	1613	December 27, 2019
Sparse GPU linear solve from documentation fails Numerics	2	136	June 17, 2025
Example of direct linear solve, Sparse Matrix in cuda General Usage gpu , linearalgebra	2	1088	March 18, 2019
Linear system solution not working in CUDA General Usage cuda , linearalgebra , linearsolve	4	108	March 1, 2025

(fast) sparse solve on GPU

Related topics