GPU sort WIP (GPU 1000x faster than CPU? I must be doing something wrong)

I didn’t know that. I read about Thrust.

So those are written in CUDA C right? I am hoping to implement the same in CUDAnative and compare the performance.

Also, if there is a convenient way to call that library from Julia on CuArrays then it will be super cool too!