Indexing in GPU kernel

I’m building a custom GPU kernel and am having trouble indexing into a GPU vector. Q is a matrix stored in CSR format
qrows - row pointers
qcols - nonzero column indices
qvals - nonzero values
Here is a code snippet:

function q_kernel!(qrows::CuDeviceVector{Int64}, qcols::CuDeviceVector{Int64}, qvals::CuDeviceVector{Float64})
    index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride = gridDim().x * blockDim().x
    tot::Float64=0.0
    @simd for i = index:stride:length(qrows)-1 #loop of the rows of the matrix (same number of rows for J and Q)
        @inbounds colind = qrows[i]:qrows[i+1]-1
        @inbounds indj = qcols[colind] #find column indices of all the non zero elements of row i in Q
 for j in indj # this loops over all the non zero elements of row i in Q 

The second to last line is getting the errors:

LoadError: InvalidIRError: compiling kernel
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:133)
Reason: unsupported call through a literal pointer (call to ijl_alloc_array_1d)

What is the correct way to index into qcols on the GPU?

You can try using view. qcols[coldind] creates a copy.

1 Like

Thanks, this worked!

@inbounds indj = @view qcols[colind] #find column indices of all the non zero elements of row i in Q