Hello,
I just started programming with CUDA.jl and would like to understand how to vectorize an operation in a GPU Kernel. In particular, given some function func(), is there a way to vectorize this function to apply element-wise to a CuVector, or even better, a CuArray? Are there any conditions func() must satisfy, like be in place, no scalar indexing, etc for it to work?
In particular, I’d like to vectorize a custom mod p^n function so that I can apply it to CUDA matrices on the GPU, but I’d appreciate advice about trying to do so with any general function. Thank you!