This is an example of the use case:
using CUDA
struct Foo{T}
x :: T
end
function kernel(y, src)
id = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
for i=id:stride:length(y)
y[i] = src.x[i]
end
return nothing
end
N = 10
y = CUDA.zeros(N)
a = Foo(CUDA.ones(N))
# @cuda threads=N kernel(y, a)
# KernelError: passing and using non-bitstype argument
b = Foo(cudaconvert(CUDA.ones(N)))
@cuda threads=N kernel(y, b)
@show y
# y = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
With manually converted CuDeviceArrays it is possible to pass custom structures into kernels. Please tell me if there are other methods to do the same.
If later in the code I wil want to access x field of Foo, I will have a read only memory error.