Arrays of arrays and arrays of structures in CUDA kernels cause random errors

Thank you, I have read the docs for GC.@preserve and this interesting discussion: On the garbage collection. I hope now I understand what causes the error in my case. However, it is not clear for me how I should use GC.@preserve in case of array of arrays. Here I prepared a much more simple MWE:

using CUDA

function kernel(a, bb)
    id = threadIdx().x
    a[id] = sum(bb[id])
    return nothing
end


N = 10

a = CUDA.zeros(N)

b = Array{CuArray}(undef, N)
for i=1:N
    b[i] = CUDA.ones(2)
end


# This potentially can cause an error
bb = CuArray([cudaconvert(b[i]) for i=1:N])
@cuda threads=N kernel(a, bb)

# Option 1:
bb = CuArray([cudaconvert(b[i]) for i=1:N])
GC.@preserve b begin
    @cuda threads=N kernel(a, bb)
end

# Option 2:
btmp = [cudaconvert(b[i]) for i=1:N]
bb = CuArray(btmp)
GC.@preserve btmp begin
    @cuda threads=N kernel(a, bb)
end

In this example I do not understand what exactly I should preserve: the original array of CuArrays b, the temporary array of CuDeviceArrays [cudaconvert(b[i]) for i=1:N], or both of them.

Sorry, it is very hard to debug such code, since GC manages to collect the original objects only if the kernel has been running long enough. Can I mimic the GC behaviour and cause the error by myself using e.g. something like finilize(b)?