I’m building a custom GPU kernel and am having trouble indexing into a GPU vector. Q is a matrix stored in CSR format
qrows - row pointers
qcols - nonzero column indices
qvals - nonzero values
Here is a code snippet:
function q_kernel!(qrows::CuDeviceVector{Int64}, qcols::CuDeviceVector{Int64}, qvals::CuDeviceVector{Float64})
index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
tot::Float64=0.0
@simd for i = index:stride:length(qrows)-1 #loop of the rows of the matrix (same number of rows for J and Q)
@inbounds colind = qrows[i]:qrows[i+1]-1
@inbounds indj = qcols[colind] #find column indices of all the non zero elements of row i in Q
for j in indj # this loops over all the non zero elements of row i in Q
The second to last line is getting the errors:
LoadError: InvalidIRError: compiling kernel
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:133)
Reason: unsupported call through a literal pointer (call to ijl_alloc_array_1d)
What is the correct way to index into qcols on the GPU?