I modified the example from the CUDAnative
readme as follows. Am I using @cuprintf
in the wrong way?
function kernel_vadd(a, b, c)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
@cuprintf("blockx= %d, blockDx=%d, threadid=%d\n",(blockIdx().x-1) , blockDim().x , threadIdx().x)
c[i] = a[i] + b[i]
return nothing
end
a = round.(rand(Float32, (3, 4)))
b = round.(rand(Float32, (3, 4)))
d_a = CuArray(a)
d_b = CuArray(b)
d_c = similar(d_a) # output array
julia> @cuda threads=12 kernel_vadd(d_a, d_b, d_c)
julia> blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12
blockx= 0, blockDx=0, threadid=12