How to make `Colon()` stable in CUDA kernel

Hi! I’m wondering is there any way to make Colon() stable when calling the CUDA kernel. For example (I’ve simplified the sample case here as the practical one is complex),

# Helper function
function view_helper(u, arg1, arg2, arg3)

    idx1 = (arg1 == 1) ? size(u, 2) * isequal(arg2, 1) : Colon()
    idx2 = (arg1 == 2) ? size(u, 3) * isequal(arg2, 1) : Colon()
    idx3 = (arg1 == 3) ? size(u, 4) * isequal(arg2, 1) : Colon()

    return view(u, :, idx1, idx2, idx3, arg3)
end

# CUDA kernel
function kernel!()

    i = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    j = (blockIdx().y - 1) * blockDim().y + threadIdx().y
    k = (blockIdx().z - 1) * blockDim().z + threadIdx().z

    if (...)
        # Compute `arg1`, `arg2`, and `arg3`
        new_u = view_helper(u, arg1, arg2, arg3)
        # Use `new_u` to do something
    end

    return nothing
end

I defined a helper function here to avoid using if/else condition too much in the kernel (as it will affect the performance). But the kernel could not run with success and if I changed Colon() to some fixed number then it could run successfully. So I guess it is the issue about the stability of Colon() when calling the CUDA kernel. Has anyone else ever had the same issue as me? And how can I solve this problem? Thanks!

In this line you are using a runtime lookup to decide the type of idx1 at runtime. You cant do that on the GPU - the type needs to be fixed at compile time.

idx1 = (arg1 == 1) ? size(u, 2) * isequal(arg2, 1) : Colon()

So instead of Colon, you could return a UnitRange from both parts of the conditional, explicitly passing the whole axis range rather than Colon. Then the type remains constant, just the runtime values change.