(fast) sparse solve on GPU

Hi,

I would like to solve sparse linear systems on the GPU. But when I try the tests example of CuArrays, the GPU version seems way slower than the CPU one. Is it expected?

Additionnally, say I have a LU factorisation of a cpu matrix. Can I transfer this to the GPU so that \ is overloaded. I may be asking for too much :thinking: but maybe somebody has already done it!

GPU sparse solves silently use the CPU for part of it IIRC, kind of like SVD.

You’ll want to upstream our fix in DiffEqBase: https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/init.jl#L146-L150

1 Like

Interesting, I will try that!

Hi,

I tried the above solution but I have a bug I cannot find. It is about using \ for transpose sparse CuArrays. I perform an iLU decomposition and then try to use the decomposition to solve linear systems on the GPU:

using SparseArrays, LinearAlgebra, IncompleteLU
n = 1000
A = I + sprand(n,n,0.01)
Precilu = ilu(A, τ = 0.1)

Now I check that my formula for the inverse works:

rhs = rand(n)
sol_0 = Precilu \ rhs
sol_1 = Precilu.U' \ ((I+Precilu.L)  \ (rhs))
norm(sol_1-sol_0, Inf64)

It returns 0!! Hence, I have perform these solves in the GPU. The first one works well:

using CuArrays
CuArrays.allowscalar(false)
sol_0 = (I+Precilu.L) \ rhs
sol_1 = LowerTriangular(CuArrays.CUSPARSE.CuSparseMatrixCSR(I+Precilu.L)) \ CuArray(rhs)
norm(sol_0-Array(sol_1), Inf64) 

and this returns the 4.492221705731936e-9. However, the following fails and I don’t understand why :roll_eyes:

sol_0 = (Precilu.U)' \ rhs
sol_1 = LowerTriangular(CuArrays.CUSPARSE.CuSparseMatrixCSR(sparse(Precilu.U'))) \ CuArray(rhs)
norm(sol_0-Array(sol_1), Inf64)

and this returns 1137.9857325312835. Maybe the fact that

rhs = rand(n)
	sol_0 = Precilu \ rhs
	sol_1 = LowerTriangular(Precilu.U') \ (LowerTriangular(I+Precilu.L)  \ (rhs))
	norm(sol_1-sol_0, Inf64)

does not return zero whereas this one does is key :thinking:

rhs = rand(n)
sol_0 = Precilu \ rhs
sol_1 =(Precilu.U') \ (LowerTriangular(I+Precilu.L)  \ (rhs))
norm(sol_1-sol_0, Inf64)

It seems that Precilu.U is not UpperTriangular but lower. So LowerTriangular(Precilu.U') in your code should be UpperTriangular(Precilu.U') which makes sense because the second linear system solve should be using the upper triangular matrix from the factorization. I don’t know why @stabbles went with that convention in the package but fixing the above fixes your problem.

Wow, this was nicely spotted, thank you!

It’s a technical issue; you get the U factor row by row in crout ILU, but SparseMatrixCSC is created column by column.

Might be worth returning a Transpose then, it’s a bit counter-intuitive that U is lower triangular :slight_smile: