First CUDA Program- Addressing Scalar Indexing

loki · June 22, 2021, 7:59pm

Hello, I am trying to get my feet wet in the world of CUDA programming. I started with some CPU code that I previously written but have been having some trouble with Scalar Indexing slowing down the computation. The problem stems from this x = v + (A_GPU' * (Vector_GPU - A_GPU * v)). I believe its due to the addition but I wasn’t sure. I was hoping that someone may have some recommendations on how to optimize this calculation or suggest some documentation or examples that could help me better understand.

I’ve included some sample code below showing my initial implementation of the code.

using CUDA, SparseArrays
using CUDA.CUSPARSE

if has_cuda_gpu()
    CUDA.allowscalar(false)
end

function GPU_test(A_GPU, Vector_GPU, niter)
    @views @inbounds begin
        v = CUDA.zeros(size(A_GPU,2))
        x_old = CUDA.zeros(size(A_GPU,2))
        for iter = 1:niter
            x = v + (A_GPU' * (Vector_GPU - A_GPU * v))
            v = x + 0.5 * (x - x_old)
            x_old = x           
        end
    end
    return x
end

function GPU_init(A_CPU,Vector_CPU, n, niter_GPU)
    A_GPU = CuSparseMatrixCSR(A_CPU)
    Vector_GPU = CuArray{ComplexF32}(undef,size(Vector_CPU))
    copyto!(Vector_GPU, Vector_CPU)
    return GPU_test(A_GPU, Vector_GPU, niter_GPU)
end
## ---
m = 800
n = 2500
d = 0.75
niter_GPU = 100

A_CPU = sprand(ComplexF32,m,n,d)
Vector_CPU = zeros(ComplexF32,m)

out = GPU_init(A_CPU,Vector_CPU, n, niter_GPU)

maleadt · June 23, 2021, 5:51am

You should look at the back trace of the error message:

julia> out = GPU_init(A_CPU,Vector_CPU, n, niter_GPU)
ERROR: Scalar indexing is disallowed.
...
Stacktrace:
...
  [7] generic_matvecmul!(C::CuArray{ComplexF32, 1}, tA::Char, A::CuSparseMatrixCSR{ComplexF32}, B::CuArray{Float32, 1}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:759
  [8] mul!
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:81 [inlined]
  [9] mul!
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
 [10] *(A::CuSparseMatrixCSR{ComplexF32}, x::CuArray{Float32, 1})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:51
 [11] GPU_test(A_GPU::CuSparseMatrixCSR{ComplexF32}, Vector_GPU::CuArray{ComplexF32, 1}, niter::Int64)
    @ Main ./REPL[5]:6

It’s clear here that this is triggered by *, because you’re doing sparse x dense multiplication with mismatched types, which falls back to the Base implementation. For sparse operations, we only use CUSPARSE (i.e. we don’t have a natively-implemented generic fallback), so you need to stick to what CUSPARSE supports.

loki · June 24, 2021, 8:38pm

Thank you for pointing that out! Though looking through the CUSPRASE.jl documentation, it appears that it should support sparse x dense multiplication. Are you saying that the issues stems from the type difference, i.e.ComplexF32 vs Float32?

maleadt · June 25, 2021, 6:11am

Yes. It’s possible CUSPARSE nowadays has generic APIs that allow mixing types (happy to take a PR for this), but the old APIs used to be pretty constrained.

loki · June 27, 2021, 5:07pm

Thank you for the help and suggestion! Addressing the type issue eliminated the error but I also saw several ways I could improve this basic algorithm as well.

I was wondering if you had any recommendations on resources or examples that would be good to learn from? I really like how simple it can be to perform a calculation on a GPU but I would like to dive deeper into programming GPUs, specifically trying to write my own kernals and such.

maleadt · June 28, 2021, 5:50am

There’s a great introductory tutorial here: Introduction · CUDA.jl

Topic		Replies	Views
Scalar indexing is disallowed GPU cuda	4	1403	August 15, 2023
Scalar Indexing Error: multiplying matrix by scalar GPU	2	945	October 25, 2021
Overcoming Slow Scalar Operations on GPU Arrays GPU performance	19	6373	January 4, 2021
GPU: Scalar indexing in kernel programming GPU cuda	2	276	June 5, 2023
Initializing Sparse Matrices with CuArrays.jl GPU first-steps	4	1912	February 22, 2019

First CUDA Program- Addressing Scalar Indexing

Related topics