Hello,
I’ve begun exploring the CUDAnative package which modest success. I’ve run into a couple of limitations.
I wanted to allocate an array from within my kernel as is shown in some of the project examples. The first limitation appears that this must be a 1 dimensional collection. Ok, I have a 2D grid and want to perform a vector operation for each, so I need to use a linear array for what I’d ideally index as 3D. I wrote a function to do this, and it only takes in integers from the block/grid dimensions and adds/multiplies them.
I can see that CUDA is giving me the results back as floats (if I attempt to store this converted index as grid output, it will crash if the container expects Int32). I’ve attempted adding type annotations, and casting but neither of those compile.
I have a minimal working example below. My questions are:
- Given the goal of allocating/using something like a multidimensional array, is there a better way to proceed? Are there index functions to handle this?
- Are there ever times when I can cast/annotate types onto my variables?
Random potentially useful tidbits:
Julia version 1.0.3
Geforce 755m
CUDAnative 1.0.0
CUDAdrv 1.0.1
CuArrays 0.9.0
Thanks,
Alex
Example:
using CUDAdrv, CUDAnative, CuArrays
function l_0(x, y, z, w, h)
return x + y*w + z*w*h
end
function l(x, y, z, w, h)
_x = x - 1
_y = y - 1
_z = z - 1
return l_0(_x, _y, _z, w, h) + 1
end
function kernel(out)
x = blockIdx().x
y = blockIdx().y
w = gridDim().x
h = gridDim().y
z = 1
# apparently @cuDynamicSharedMem can only be 1 dimensional?
arr = @cuDynamicSharedMem(Int32, w * h * 3)
# it allows me to say `linear_index :: Int..` but not `linear_index :: Int32..`
linear_index = l(x,y,z,w,h)
out[x, y] = linear_index
return nothing
end
function make_matrix(width :: Int, height :: Int)
grid = (width, height)
threads = (1,)
# I can't change this Float32 -> Int32
cu_out = CuArray{Float32, 2}(undef, width, height)
@cuda blocks=grid threads=threads kernel(cu_out)
out = Array{Float32, 2}(cu_out)
return out
end
function main()
width = 10
height = 10
matrix = make_matrix(width, height)
println(matrix)
end
main()