How to use `copy!`(or other array operations) inside a CUDA kernel?

Note that there is some level of broadcasting that works in kernels if you use StaticArrays, e.g.

Y = CUDA.rand(10,2)
 
function kernel(Y)
    X = @SVector zeros(10)
    Y[:,1] .= X .+ 1
    nothing
end

@cuda kernel(Y)

although I couldn’t get exactly your example working. Based on:

maybe its possible? Perhaps @mateuszbaran can comment further, I’m curious myself too.