GPU sort WIP (GPU 1000x faster than CPU? I must be doing something wrong)

Yes, quite some CUDA features are missing. I’d love to spend more time on that, but there’s been very little interest in these features. So definitely open issues or ask for help if you’re interested.