CUSPARSE matrix-matrix multiplication not using GPU

I’m trying to perform sparse matrix-matrix multiplication on the GPU, which looks like it should be supported by CUSPARSE via cusparseSpmm. But it doesn’t seem to be working using Julia’s CUDA.CUSPARSE:

using CUDA, CUDA.CUSPARSE, LinearAlgebra, SparseArrays
A = SparseMatrixCSC{Float64,Int64}(I,1000,1000)
B = SparseMatrixCSC{Float64,Int64}(I,1000,1000)
cuA = CuSparseMatrixCSC(A)
cuB = CuSparseMatrixCSC(B)
cuC = similar(cuA)
cuv = CuVector(ones(1000))
cuA*cuB
mul!(cuC,cuA,cuB)

The cuX matrices seem to be created on the GPU correctly:

julia> typeof(cuA)
CuSparseMatrixCSC{Float64, Int32}

But both * and mul! warn about scalar indexing.

Sparse-matrix * vector multiplication does seem to be working correctly on the GPU:

julia> cuA*cuv
1000-element CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}

What am I doing wrong here?

Here are the system details:

Julia Version 1.8.3 (2022-11-14)
CUDA v3.12.0

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 510.73.8, for CUDA 11.6
CUDA driver 11.7

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+510.73.8
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.3
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA A10G (sm_86, 22.196 GiB / 22.488 GiB available)

Try the latest master, much has changed in the CUSPARSE wrappers.

@tsc25 It should work if you use the branch master:

1 Like