Using sparse matrix in CUDA kernel

I’m trying to call nonzeros on a sparse matrix in my CUDA kernel but I’m getting a dynamic function invocation error: unsupported dynamic function invocation (call to nonzeros).

Is this simply not supported or am I missing something?

Here’s an example that should reproduce the error:

using CUDA
using SparseArrays

function kernel(sm)
    vals = nonzeros(sm)
    for val in vals

sm = cu(sprand(5, 5, 0.5))

@cuda kernel(sm)

Device-side functionality for sparse arrays is practically nonexisting. Many sparse array-related functions (like simply indexing) would require iteration, which is not something you want to do on each thread.

If you’re instead looking into actually implementing sparse array kernels, have a look at the implementation of broadcast for sparse arrays in the CUDA.jl source code,, but beware that this isn’t simple code. For simple element-wise operations like broadcast you can basically work on a thread per compressed row (or column) and use a for loop to iterate elements, which is what the linked code does through iteration helper structures to deduplicate code, but for more complex operations (like matmul) that isn’t viable.