Combining CPU and GPU

Stupid question: did you measure with @btime and check with a profiler? I’m only halfway qualified to talk about the CPU part of the question, but would suspect @batch or @tturbo should do better for the CPU, see this thread for example.

And is an exemplary MWE really out of reach?

As for your original question: I found this paper, but I haven’t seen any mention of such technology in this group recently. The first hit when searching Discourse is this thread.