Hi! I’m wondering is there any way to make Colon()
stable when calling the CUDA kernel. For example (I’ve simplified the sample case here as the practical one is complex),
# Helper function
function view_helper(u, arg1, arg2, arg3)
idx1 = (arg1 == 1) ? size(u, 2) * isequal(arg2, 1) : Colon()
idx2 = (arg1 == 2) ? size(u, 3) * isequal(arg2, 1) : Colon()
idx3 = (arg1 == 3) ? size(u, 4) * isequal(arg2, 1) : Colon()
return view(u, :, idx1, idx2, idx3, arg3)
end
# CUDA kernel
function kernel!()
i = (blockIdx().x - 1) * blockDim().x + threadIdx().x
j = (blockIdx().y - 1) * blockDim().y + threadIdx().y
k = (blockIdx().z - 1) * blockDim().z + threadIdx().z
if (...)
# Compute `arg1`, `arg2`, and `arg3`
new_u = view_helper(u, arg1, arg2, arg3)
# Use `new_u` to do something
end
return nothing
end
I defined a helper function here to avoid using if/else
condition too much in the kernel (as it will affect the performance). But the kernel could not run with success and if I changed Colon()
to some fixed number then it could run successfully. So I guess it is the issue about the stability of Colon()
when calling the CUDA kernel. Has anyone else ever had the same issue as me? And how can I solve this problem? Thanks!