How to use GPU acceleration to solve linear equation Ax=b

automaticvehiclerook · October 3, 2024, 7:45am

In my experiment,I want to solve the equation Ax=b on GPU to accelerate the process.I have tried the iterative solver from Kryvol.jl and cuSOLVER,but the GPU is still slower than CPU.Is that normal?

using CUDA
using SparseArrays
using LinearAlgebra
using Krylov
using BenchmarkTools
using JLD2
# Function： Using BiCGStab on CPU to solve the sparse matrix linear equation
function solve_sparse_cpu(A, b)
    x = zeros(Float32, size(b))
    x=Krylov.bicgstab(A, b)
    return x
end

# Function： Using BiCGStab on GPU to solve the sparse matrix linear equation
function solve_sparse_gpu(d_A, d_b)
    
    d_x = CUDA.zeros(Float32, size(d_b))  

    d_x=Krylov.bicgstab(d_A, d_b, d_x)

    return d_x  
end

@load "C:/Users/13733/Desktop/DistributionPowerFlow/J.jld2" J
@load "C:/Users/13733/Desktop/DistributionPowerFlow/F.jld2" F
matrix_sizes = size(J,1)

for n in matrix_sizes
    println("The scale of matrix A: $n x $n")

    A = Float32.(sparse(J))  
    b = Float32.(-F)
    d_A = CUDA.CUSPARSE.CuSparseMatrixCSC(A) 
    d_b = CuArray(b)            

    # CPU time consumption
    cpu_time = @belapsed solve_sparse_cpu($A, $b)
    println("CPU time consumption $cpu_time seconds")

    # GPU time consumption
    gpu_time = @belapsed solve_sparse_gpu($d_A, $d_b)
    println("GPU time consumption: $gpu_time seconds")

    println("speed up (CPU / GPU): $(cpu_time / gpu_time)")

    println("=====================================")
end

The number of non-zero element in matrix J is 23958 and the size of J is 3498*3498.The output result is as follows.

The scale of matrix A: 3498 x 3498
CPU time consumption 0.1821423 seconds
GPU time consumption: 1.2135631 seconds
speed up (CPU / GPU): 0.15008885817309375
=====================================

I found that in highly sparse matrices, solving on GPU is not as fast as on CPU

vettert · October 3, 2024, 7:53am

How big is your system?

automaticvehiclerook · October 3, 2024, 10:16am

A is a 3498*3498 sparse matrix

automaticvehiclerook · October 3, 2024, 10:24am

My GPU is NVIDIA GeForce RTX 4060 laptop,and my CPU is i9-12900HX

amontoison · October 5, 2024, 1:55am

Hi @automaticvehiclerook!

For this kind of small linear systems, the GPU could be less efficient than CPU.
You could try a few things to improve performances:

Use the CuSparseMatrixCSR format, it’s faster if you only need products with A and not its adjoint A'.
Preallocate the workspace with solver = BicgstabSolver(d_A, d_b) and call the method in-place with bicgstab!(solver, d_A, d_b) to check if the majority of time spent is not related to allocations.
Use the package KrylovPreconditioners.jl to create an operator optimized for multiple products A * v. You just need to pass op_A = KrylovOperator(d_A) to bicgstab or bicgstab!.

I also suggest to use CUDA.@profile to check what is the most expensive operation on GPU for your linear system.

Note that you can also have a faster code on CPU if you use multithreaded matrix-vector products and optimized BLAS.
Just adding using MKL, MKLSparse before calling a Krylov solver can lead to a nice speed-up.
The question is about GPU, but it’s better to have a good baseline on CPU for comparison.

amontoison · October 5, 2024, 2:24am

I suggest to also try CUDSS.jl.
It’s an interface to sparse direct methods (LDL’, LL’ or LU) on NVIDIA GPUs.
It’s still a preview library of NVIDIA but it works well.

It should be more efficient than Krylov.jl for small problems.

automaticvehiclerook · October 5, 2024, 4:41am

Thank you for your reply！In fact, what I want to solve is a large-scale linear equation.I tested a 17036*17036 system and it took about 20 seconds on the CPU and 10 seconds on the GPU.I want to use a preconditioner for Bicgstab but I don’t know how to do it on GPU.

automaticvehiclerook · October 5, 2024, 4:44am

I have to adjust the precision to Float64 to meet my engineering requirement

using CUDA
using SparseArrays
using LinearAlgebra
using Krylov
using BenchmarkTools
using JLD2
# Function： Using BiCGStab on CPU to solve the sparse matrix linear equation
function solve_sparse_cpu(A, b)
    x = zeros(Float64, size(b))
    x=Krylov.bicgstab(A, b)
    return x
end

# Function： Using BiCGStab on GPU to solve the sparse matrix linear equation
function solve_sparse_gpu(d_A, d_b)
    
    d_x = CUDA.zeros(Float64, size(d_b))  

    d_x=Krylov.bicgstab(d_A, d_b, d_x)

    return d_x  
end

@load "C:/Users/13733/Desktop/DistributionPowerFlow/J9241" J9241
@load "C:/Users/13733/Desktop/DistributionPowerFlow/F9241" F9241
J=J9241
F=F9241
matrix_sizes = size(J,1)

for n in matrix_sizes
    println("The scale of matrix A: $n x $n")

    A = Float64.(sparse(J))  
    b = Float64.(-F)
    d_A = CUDA.CUSPARSE.CuSparseMatrixCSC(A) 
    d_b = CuArray(b)            

    # CPU time consumption
    cpu_time = @belapsed solve_sparse_cpu($A, $b)
    println("CPU time consumption $cpu_time seconds")

    # GPU time consumption
    gpu_time = @belapsed solve_sparse_gpu($d_A, $d_b)
    println("GPU time consumption: $gpu_time seconds")

    println("speed up (CPU / GPU): $(cpu_time / gpu_time)")

    println("=====================================")
end

amontoison · October 5, 2024, 4:51am

I give a few examples in the documentation of Krylov.jl:

On GPU, I suggest the following preconditioners:

Jacobi / block-Jacobi
Incomplete Cholesky with zero fill-in – IC(0)
Incomplete LU with zero fill-in – ILU(0)

All of them are available in KrylovPreconditioners.jl.
You can find some examples here of how to use kp_ic0, kp_il0 and BlockJacobiPreconditioner on GPUs.

automaticvehiclerook · October 5, 2024, 5:35am

Thank you very much for your help. I will try it

Topic		Replies	Views
(fast) sparse solve on GPU General Usage question , gpu	7	1667	December 28, 2019
Solves the linear system using CuArrays.jl GPU	3	1602	December 27, 2019
Use of linear operators on a GPU GPU question	3	503	January 18, 2023
Example of direct linear solve, Sparse Matrix in cuda General Usage gpu , linearalgebra	2	1079	March 18, 2019
Help using CUDA to exponentiate Matrix GPU	3	406	April 27, 2024

How to use GPU acceleration to solve linear equation Ax=b

Related topics