Hello!
This is the first time I have really dived into GPU programming. I have followed along this tutorial, https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/, and have started getting a feel for how to do some different simple stuff and I see already promising results.
In the end I would like to make a “kernel” which calculates a function like this:
function PackStep(pg,pg_tmp,u,u_tmp,nCoords,nTot)
@fastmath @inbounds @threads for i = 1:nCoords
Wgx = 0.0;
Wgz = 0.0;
p_i = pg[i];
@inbounds @simd for j = 1:nTot
if (i != j)
p_j = pg[j];
rij = p_i .- p_j;
RIJ = norm(rij);
RIJ1 = 1.0 / RIJ;
q = RIJ/H;
if (q <= 2)
qq3 = q*(q-2)^3;
Wq = AD * FAC * qq3;
x_ij = rij[1];
z_ij = rij[3];
Wgx += Wq * (x_ij * RIJ1) * H1;
Wgz += Wq * (z_ij * RIJ1) * H1;
end
end
end
# Define here since I use it a lot
u_i = u[i];
dux = (-BETA * Wgx * V - ZETA * u_i[1])*DT;
duz = (-BETA * Wgz * V - ZETA * u_i[3])*DT;
dx = dux*DT;
dz = duz*DT;
u_tmp[i] = u_i .+ (dux, 0.0, duz)
pg_tmp[i] = pg[i] .+ (dx, 0.0, dz)
end
@inbounds @simd for i = 1:nCoords
u[i] = u_tmp[i];
pg[i] = pg_tmp[i];
end
end
To give some basic intuition about the function, it basically calculates how a particle, p_i, is going to move depending on Wgx and Wgz. So basically one calculates for each particle from 1 to nCoords how it should move depending on the other particles, calculated in the innerloop from j = 1:nTot.
The first four inputs are Array{Tuple{Float64,Float64,Float64},2}
with N particles in X Y and Z coordinates and the two last are just Int64
. The reason for using tuple’s has been that they have given very good performance for CPU implementation, but since they do not allow for in place operations, they seem unfit for GPU’s. Standard arrays seem to have bad performance as well, atleast when I checked with the CPU implementation.
So basically my question is quite openended, since I am new to it - so here are a few questions.
Currently I have been testing with the CUDA options, but maybe something like ArrayFire is smarter?
What should I replace tuple’s with or can I get it to work using GPU?
If anyone have more suggestions for resources to gradually look into regarding GPU programming in Julia please do share.
Kind regards