Passing array of pointers to CUDA kernel

I need to be able to stack arbitrarily sized cuarrays and CuTextures as a single input to a kernel, so I can do operations in the kernel that loop over the data like so:

(get GitHub - cdsousa/CuTextures.jl: [DEPRECATED, moved into CUDA.jl] CUDA textures ("CUDA arrays") interface for native Julia to run example)

using CuTextures, CuArrays, CUDAnative, CUDAdrv
myImages = [CuTexture(CuTextureArray(CuArrays.rand(2,2))) for i = 1:20];
out = CuArrays.rand(1);
function myKernel!(myImages,out)
  for i = 1:length(myImages)
    out += myImages[i](1.3,1.2)
  end
  return nothing
end
@cuda threads=1 blocks=1 myKernel!(myImages,out)

The above errors out because “passing and using non-bitstype argument”

Is there a way to do this right now, or will there be?

My application requires n dimensional stacks of 2d images, and the reason I cannot simply concatenate them into three dimensional images, is because of the resulting memory allocation problem when I have to replace only one of many images

This is a CPU array of CuTexture objects, which you can’t pass to the GPU. You need a CuArray instead.

You also need to know that GPU objects you instantiate on the CPU (e.g. CuArray, CuTexture) don’t get copied as-is to the GPU. They get converted at the time of @cuda to a device-side counterpart (e.g. CuDeviceArray, CuDeviceTexture). This conversion does not automatically happen for the elements of an array, so you will need to call cudaconvert explicitly on the CuTexture objects. If you were to pass a Tuple instead, the elements would get auto-converted (but using Tuples has other trade-offs, of course).

1 Like

ok, for reference the following works, thanks again for the help, Tim

function krn(im,out)
  val = out[1]
  for i = 1:length(im)
    val += im[i](Float32(1.2),Float32(1.2))
  end
  out[1] = val
  return nothing
end

out = CuArrays.rand(1);

im = CuArray([cudaconvert(CuTexture(CuTextureArray(CuArrays.rand(2,2)))) for i = 1:20]);

@cuda threads=1 blocks=1 krn(im,out)