How should I approach GPU programming?

Skoffer · April 16, 2020, 1:05pm

One more thing, which may be useful (but it wouldn’t give much since it’s O(n) and your bottleneck is O(n^2)), you can remove

    @inbounds for i = 1:nCoords
        u[i]  = u_tmp[i];
        pg[i] = pg_tmp[i];
    end

and instead do binding outside of PackStep function, something like this:

for _ in 1:number_of_steps
  PackStep(pg, pg_tmp, u, u_tmp, nCoords, nTot)
  pg, pg_tmp = pg_tmp, pg
  u, u_tmp = u_tmp, u
end

In this case it will be binding, not copying, so you can squeeze few additional nanoseconds.

P.S.: for nearest neighbours may be this thread will be relevant Cell list algorithm is slower than double for loop

Topic		Replies	Views
Problem with GPU programming GPU cudanative , cuda	4	1059	September 13, 2019
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3313	December 20, 2018
Base function in Cuda kernels General Usage cudanative , cuda	8	3208	March 15, 2019
How to get started with GPU programming? OpenCL or CUDA? GPU	7	7240	August 29, 2017
One example from `GPU programming in Julia \| Workshop \| JuliaCon 2021` GPU question , gpu	0	364	April 5, 2022