Hi,
I am just wondering if there is any way to directly pass CuArrays or CuPtr in CUDA.jl to C/C++, like passing Core.Array or Core.Ref. For example, I would like to define a C function which gets a device memory pointer (with type double*, cuComplex*, etc.) and pass CuArrays or CuPtr to it.
I am asking this question because I want to minimize the data transfer between a device and a host. Of course I can pass CuArrays or CuPtr to C functions by first transferring them to a host but if the size of CuArrays becomes large, this will produce big overhead…