How to perform a sparse matrix dense matrix product with addition (cuda library style)

I want to compute the following

result = -1 * (e * (e' * U)) + M * U

Where e = ones(n,1), M is a n x n CuSparseMatrixCSR and U is a n x r dense CuMatrix.

Since it is my understanding that the above operation invokes 4 kernels, I am trying to reduce the number to three by using a function that performs both the product and the sum, like the SPMM function in the CUDA Libraries, something like the function below:

function grad_function(e::CuArray, U::CuArray, M::CuSparseMatrixCSR)
  out = -e .* sum(U, dims=1)
  alpha = 1.0
  beta = 1.0
  CUDA.mul!(out, M, U, alpha, beta) # out = alpha*(M*U) + beta*out
  return out
end

Is it possible to do it with the CUDA.jl?

spmm! is both available directly, by calling CUSPARSE.cusparseSpMM; as a slightly higher-level CUSPARSE.mm!, CUDA.jl/lib/cusparse/generic.jl at 5b470c4614f7388c7b5b0938a93f781294673e80 · JuliaGPU/CUDA.jl · GitHub; and via LinearAlgebra.mul! as called on sparse arrays, CUDA.jl/lib/cusparse/interfaces.jl at 5b470c4614f7388c7b5b0938a93f781294673e80 · JuliaGPU/CUDA.jl · GitHub.

1 Like