Hi! I’m wondering is there any way to make
Colon() stable when calling the CUDA kernel. For example (I’ve simplified the sample case here as the practical one is complex),
# Helper function function view_helper(u, arg1, arg2, arg3) idx1 = (arg1 == 1) ? size(u, 2) * isequal(arg2, 1) : Colon() idx2 = (arg1 == 2) ? size(u, 3) * isequal(arg2, 1) : Colon() idx3 = (arg1 == 3) ? size(u, 4) * isequal(arg2, 1) : Colon() return view(u, :, idx1, idx2, idx3, arg3) end # CUDA kernel function kernel!() i = (blockIdx().x - 1) * blockDim().x + threadIdx().x j = (blockIdx().y - 1) * blockDim().y + threadIdx().y k = (blockIdx().z - 1) * blockDim().z + threadIdx().z if (...) # Compute `arg1`, `arg2`, and `arg3` new_u = view_helper(u, arg1, arg2, arg3) # Use `new_u` to do something end return nothing end
I defined a helper function here to avoid using
if/else condition too much in the kernel (as it will affect the performance). But the kernel could not run with success and if I changed
Colon() to some fixed number then it could run successfully. So I guess it is the issue about the stability of
Colon() when calling the CUDA kernel. Has anyone else ever had the same issue as me? And how can I solve this problem? Thanks!