How to vectorize any function on the GPU with CUDA.jl?


I just started programming with CUDA.jl and would like to understand how to vectorize an operation in a GPU Kernel. In particular, given some function func(), is there a way to vectorize this function to apply element-wise to a CuVector, or even better, a CuArray? Are there any conditions func() must satisfy, like be in place, no scalar indexing, etc for it to work?

In particular, I’d like to vectorize a custom mod p^n function so that I can apply it to CUDA matrices on the GPU, but I’d appreciate advice about trying to do so with any general function. Thank you!

By using broadcast, func.(x::CuArray), or map.

1 Like

Thank you for the response, but I am still unable to get this working. Here is a simple program where I try to broadcast the += operator:

a = CuArray([1,2,3,4])

a .+= 1

When run I get the following error:

I have tried using map() as well, and changing the way I define a, such as by using CUDA.ones() or CUDA.fill() only to get a similar error.

Here is my CUDA version:

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 12.3
NVIDIA driver 546.12.0

- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+546.12

- Julia: 1.10.2
- LLVM: 15.0.7
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3060 (sm_86, 11.256 GiB / 12.000 GiB available)

I would appreciate further advice on this.

You have GPUCompiler.jl v0.17.3 installed, which is a very old version that’s not compatible with Julia 1.10. Please upgrade your packages and try again.