CUDAnative: kernel multidimensional access

Colin_Beckingham · February 3, 2017, 10:29am

I have a toy example where the goal is to process a 3x3 matrix of integers on the GPU and do an element wise doubling of the numbers. I have no problem doing this allowing CUDAnative to linearize the array to a vector, but attempting to process the array as a 3x3 on the GPU is puzzling. Here is my toy example which produces the right answer but for the wrong reason.

using CUDAdrv, CUDAnative

function kernel_mmul(a, c)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    j = (blockIdx().y-1) * blockDim().y + threadIdx().y
    c[i,j] = a[i,j].*2
    @cuprintf(" %d %d %d %d\n",i,j,c[i,j],threadIdx().y)
    return nothing
end

dev = CuDevice(0)
ctx = CuContext(dev)
a = Int32[1 2 3; 2 3 1; 3 1 2]
d_a = CuArray(a)
d_c = similar(d_a) 
@cuda ((1,1),(3,3)) kernel_mmul(d_a, d_c)
c = Array(d_c)
println(a)
println(c)
destroy(ctx)

For some reason the index j in the kernel is always zero. So I guess multiple blocks of i are processed to get the answer. The count of iterations is correct each time and the result is correct as long as the process does not end in error due to poor choice of grid and block combinations. Of note is that blockDim seems to be zero which is counter intuitive.

maleadt · February 3, 2017, 4:37pm

Your index calculation is correct, but by doing -1 it gets promoted to Int64, which means your format specifier is wrong. Either use %ld, or do -Int32(1).

Relevant issue: https://github.com/JuliaGPU/CUDAnative.jl/issues/25

Colin_Beckingham · February 3, 2017, 5:47pm

Wow! You are right. The format string " %d %ld %d %d\n" works. So it might be informative that the variable “i” seems to be ambivalent about Int32 or Int64, but the variable “j” is very sensitive to that setting, even using the exact same value/type for the -1 component.

maleadt · February 3, 2017, 9:56pm

Huh, curious. You should be using %ld for both though. Or even better, keep i and j 32-bits (although it doesn’t matter much in this case).

Topic		Replies	Views
Casting, annotations and numeric types for CUDAnative GPU type , parametric-types	5	1452	January 21, 2019
What is the recommended type <: Integer to use when doing index arithmetics? GPU cudanative	3	1466	July 24, 2018
CUDAnative: Using second and third dims in the kernel GPU cudanative	2	874	January 31, 2017
Mapping ThreadIdx().x to a 5D array? GPU	8	1206	June 15, 2018
Strange behaviour of @cuprintf? GPU	3	822	June 13, 2018

CUDAnative: kernel multidimensional access

Related topics