Cross product with CUDA.jl

Hello,
I am looking for the way to compute cross product on a GPU. Basically something similar to this:

function cross(w::CuArray{Float32,1}, u::CuArray{Float32,1}, v::CuArray{Float32,1})
w[1] = u[2] * v[3] - u[3] * v[2]
w[2] = u[3] * v[1] - u[1] * v[3]
w[3] = u[1] * v[2] - u[2] * v[1]
end

Is it possible to be done with CuArrays only? Or I would need to write a kernel?

You could do something like this, although whether working with length 3 CuArrays is ever a good idea I’m not sure.

using Permutations, TensorCore, LinearAlgebra
const epsilon = Float64[allunique([i,j,k]) && -sign(Permutation([i,j,k])) for i in 1:3, j in 1:3, k in 1:3]
mycross(u,v) = u ⊡ epsilon ⊡ v

u, v = rand(3), rand(3)
cross(u,v) ≈ mycross(u,v)

If you need many cross products I‘d make two CuVectors containing many StaticArrays and then broadcast the cross product over them. Didn‘t think too much about it, but should work.