I am trying to compare if there is any speed benefit to use Nvidia’s math pow function than my own square kernel. Here is the code. using CuArrays, CUDAnative, CUDAdrv function square(a, ndrange) i = (blockIdx().x-1) * blockDim().x + threadIdx().x if(i < ndrange+1) a[i] =…

Timing square function in CUDA

y4lu December 8, 2018, 3:41am 2

See GPU randn way slower than rand?
I think you need the @sync or equivalent CUDAdrv.synchronize()

Topic		Replies	Views
CUDA unexplainable SPEEDUP! Local memory? Performance cuda	4	449	December 29, 2021
CuArray/CUDAnative argmin paradoxical performance GPU	2	845	January 31, 2019
CUDAnative , performance drop after several timesteps Performance question , gpu , cudanative , cuda	5	1161	September 8, 2019
cuArrays vs CUDANative GPU	3	1354	November 14, 2018
Accessing array elements too slow? GPU	10	583	April 23, 2021