Strange behavior of CuDeviceArrays

This is an example of the use case:

using CUDA

struct Foo{T}
    x :: T
end

function kernel(y, src)
    id = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride = blockDim().x * gridDim().x
    for i=id:stride:length(y)
        y[i] = src.x[i]
    end
    return nothing
end

N = 10
y = CUDA.zeros(N)

a = Foo(CUDA.ones(N))
# @cuda threads=N kernel(y, a)
# KernelError: passing and using non-bitstype argument

b = Foo(cudaconvert(CUDA.ones(N)))
@cuda threads=N kernel(y, b)
@show y
# y = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

With manually converted CuDeviceArrays it is possible to pass custom structures into kernels. Please tell me if there are other methods to do the same.
If later in the code I wil want to access x field of Foo, I will have a read only memory error.