Using sparse matrix in CUDA kernel

joel · November 25, 2023, 9:41am

Hi,
I’m trying to call nonzeros on a sparse matrix in my CUDA kernel but I’m getting a dynamic function invocation error: unsupported dynamic function invocation (call to nonzeros).

Is this simply not supported or am I missing something?

Here’s an example that should reproduce the error:

using CUDA
using SparseArrays

function kernel(sm)
    vals = nonzeros(sm)
    for val in vals
        @cuprintln("$val")
    end
    nothing
end

sm = cu(sprand(5, 5, 0.5))

@cuda kernel(sm)

maleadt · November 27, 2023, 5:05pm

Device-side functionality for sparse arrays is practically nonexisting. Many sparse array-related functions (like simply indexing) would require iteration, which is not something you want to do on each thread.

If you’re instead looking into actually implementing sparse array kernels, have a look at the implementation of broadcast for sparse arrays in the CUDA.jl source code, https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/broadcast.jl, but beware that this isn’t simple code. For simple element-wise operations like broadcast you can basically work on a thread per compressed row (or column) and use a for loop to iterate elements, which is what the linked code does through iteration helper structures to deduplicate code, but for more complex operations (like matmul) that isn’t viable.