Slice the type CuSparseMatrixCSC matrix

automaticvehiclerook · September 8, 2024, 8:32am

I have a doubt regarding slicing the sparse matrix on GPU.I don’t know why does the result matrix start assigning values from the second row.This is an unexpected result for me.How can i fix the error?

using CUDA
using CUDA.CUSPARSE
using SparseArrays
using Printf

a = [1,2,3]
b = [1,2,3]
n = 10  # size of the matrix
density = 0.7  # density of the non-zero elements
sparse_matrix = sprand(n, n, density)
na = length(a)
nb = length(b)
a_gpu = CUDA.CuArray(a)
b_gpu = CUDA.CuArray(b)
result = CUDA.zeros(Float32, na, nb)
# Convert to CuSparseMatrixCSC
sparse_matrix_gpu = CUDA.CUSPARSE.CuSparseMatrixCSC(sparse_matrix)
#The col_indices
col=sparse_matrix_gpu.colPtr
#the row_indices
row=sparse_matrix_gpu.rowVal
#the values
val=sparse_matrix_gpu.nzVal

function kernel_index!(result,row_indices,col_ptrs,val,a,b,na,nb)
    i=threadIdx().x+(blockIdx().x-1)*blockDim().x+1
    if(i<=na)
        for j in 1:nb
            row=a[i]-1
            col=b[j]
            value=0.0
            for k in col_ptrs[col]:(col_ptrs[col+1]-1)
                if row_indices[k]==row
                    value=val[k]
                    break
                end
            end
            result[i,j]=value
        end
    end
    return
end
threads_per_block = 256
blocks_per_grid = ceil(Int, na / threads_per_block)
@cuda threads=threads_per_block blocks=blocks_per_grid kernel_index!(result, row, col, val, a_gpu, b_gpu, na, nb)

The result matrix is

julia> result
3×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.0       0.0       0.0
 0.0       0.248917  0.263072
 0.128449  0.0       0.620226

It only saves sparse_matrix_gpu [1:2,1:3] and starts assigning values from the second row of the result matrix

eldee · September 8, 2024, 12:53pm

Hi, and welcome to the Julia community!

Here you correctly noted that blockIdx().x starts at 1, so that you need to subtract 1 for the offset calculation. Similarly, threadIdx().x also begins at 1, meaning that to end up with a 1-indexed i, you don’t need to subtract (or add) 1. So the correct code would be

i = threadIdx().x + (blockIdx().x - 1) * blockDim().x

In general these index manipulations can be confusing, so I would advise to just manually try out the boundary cases. E.g. the ‘first’ thread has threadIdx().x == 1 == blockIdx().x and should end up with i = 1.

As a side-note, 1 is of type Int64, which you typically want to avoid on the GPU. Instead you should use a 32-bit version, e.g. as 1i32, after importing via using CUDA: i32. See the Performance Tips for more information.

automaticvehiclerook · September 8, 2024, 1:36pm

Thank you very much for your help. It is very helpful in improving efficiency

Topic		Replies	Views
Initializing Sparse Matrices with CuArrays.jl GPU first-steps	4	1914	February 22, 2019
Use GPU to generate known sparse matrix GPU	2	2019	January 6, 2020
First CUDA Program- Addressing Scalar Indexing GPU	5	1166	June 28, 2021
Simple kernel not working GPU	10	1224	July 12, 2020
CUDAnative: kernel multidimensional access GPU cudanative	3	1188	February 3, 2017

Slice the type CuSparseMatrixCSC matrix

Related topics