Indexing 2D CuArray

vavrines · July 17, 2021, 7:03pm

I’m learning to use CUDA.jl based on the tutorial, and trying to write a kernel for 2D array.

using CUDA

function divide0!(y, a, b)
    for j = 1:size(y, 2)
        for i = 1:size(y, 1)
            @inbounds y[i, j] = a[i, j] / b[i]
        end
    end

    return nothing
end

function divide1!(y, a, b)
    idx = threadIdx().x
    idy = threadIdx().y
    strx = blockDim().x
    stry = blockDim().y
    for j = idy:stry:size(y, 2)
        for i = idx:strx:size(y, 1)
            @inbounds y[i, j] = a[i, j] / b[i]
        end
    end

    return nothing
end

function divide2!(y, a, b)
    idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    idy = (blockIdx().y - 1) * blockDim().y + threadIdx().y
    for j = idy
        for i = idx
            @inbounds y[i, j] = a[i, j] / b[i]
        end
    end

    return nothing
end

a = rand(32, 16) |> CuArray
b = rand(32) |> CuArray

y0 = zero(a)
y1 = zero(a)
y2 = zero(a)

@cuda divide0!(y0, a, b)
@cuda threads=32 divide1!(y1, a, b)
@cuda threads=32 blocks=16 divide2!(y2, a, b)

However, I found divide2 doesn’t give a same result as 0 and 1.

julia> y0 == y1
true

julia> y1 == y2
false

Do I make any silly mistakes here?

xiaodai · July 17, 2021, 11:47pm

neither is a range. you could’ve figured it out by printing idx and idy.

fedoroff · July 18, 2021, 6:37am

With CartesianIndices you can iterate using the linear index and at the same time have access to the indices along each dimension:

function divide2!(y, a, b)
    id = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride = blockDim().x * gridDim().x

    Nx, Ny = size(y)

    cind = CartesianIndices((Nx, Ny))

    for k=id:stride:Nx*Ny
        i = cind[k][1]
        j = cind[k][2]
        @inbounds y[i, j] = a[i, j] / b[i]
    end

    return nothing
end

vavrines · July 18, 2021, 8:27am

Following the above codes, this way also works.

function divide2!(y, a, b)
    idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    idy = (blockIdx().y - 1) * blockDim().y + threadIdx().y
    strx = blockDim().x * gridDim().x
    stry = blockDim().y * gridDim().y
    Nx, Ny = size(y)

    for j = idy:stry:Ny
        for i = idx:strx:Nx
            @inbounds y[i, j] = a[i, j] / b[i]
        end
    end

    return nothing
end

Note it here for reference.

Topic		Replies	Views
Is it possible to index a CuArray with a CuArray? GPU question	1	834	January 11, 2019
Simple kernel not working GPU	10	1194	July 12, 2020
CUDAnative: kernel multidimensional access GPU cudanative	3	1168	February 3, 2017
CUDA \| nested loops kernel GPU question	5	168	May 12, 2025
CuArrays.jl errors with indexing / CartesianIndices General Usage gpu , indexing , cuda	1	591	July 24, 2019

Indexing 2D CuArray

Related topics