ArrayFire Julia: Performant way to calculate vector cross product?

Hi

Is there a way to calculate vector cross product through ArrayFire.jl in a performant way? The function is not implemented in ArrayFire and the getindex() and setindex!() functions for Julia cause data movement between the host and the device making the cross product calculation slow when defining it like this.

cross(a::ArrayFire.AFArray, b::ArrayFire.AFArray) = ArrayFire.AFArray([a[2]*b[3]-a[3]*b[2]; a[3]*b[1]-a[1]*b[3]; a[1]*b[2]-a[2]*b[1]]);

I have around 175 000 instances I need to calculations and the cross product is only a part of the code, so getting this done at the GPU would help speeding up the code a lot.