CUDA.jl kernel is half as fast as c++ Kernel

it won’t fix everything, but you should probably use cis instead of exp(im*...). Another main difference between Julia and c++ here is that Julia doesn’t automatically reassociate your math, so you might want to use the MulAddMacro package that allows this.

1 Like