Elegant kernel for vectors in matrix

I’m trying to write a kernel that does some operations on vectors using CUDA.jl and @cuda. My code is essentially this:

points = Nx3 matrix with x,y,z coordinates of points
elements = Nx4 matrix with integers that list rows in "points" making up quadrilaterals
A = NxN matrix
for i in 1:N
 for j in 1:N
   a bunch of operations (and, unfortunately, some if statements) involving the points in elements[i,:] and elements[j,:]
   A[i,j] = a function of those operations

A lot of the operations involve dot and cross. On the CPU I deal with that in a pretty clean way. Say:

point1=points[elements[i,1],:] # gives me an array with 3 numbers x,y,z
vec1 = point1-point2;
foo = dot(vec1,vec2)
bar = cross(vec1,vec2)

Is there an elegant way to do that on the GPU? Trying to use [i,:] doesn’t seem to work and extracting subvectors of an array doesn’t seem too easy (also, assigning subvectors of an array! Say A[1,:] = b).
I don’t even know how to create an array in a kernel. The only way that I managed to do some math was to index each individual position and do the calculations in the classic way (i.e., old school C-style, not Matlab-style).
Thanks a lot!

Check out the StaticArrays.jl package. Make points into a Vector containing SVectors.

Ah, that does seem more elegant than what I’m doing on the CPU! A=CuArray(Vector{SVector{3,Float64}}(undef,10))
works and I can do dot and cross! Thanks a lot!

@dpsanders you, sir, have just sped up a certain part of my code 10x on the CPU! Many thanks!
Now on to the GPU.

1 Like