I guess, the correct way of writing generic CPU/GPU code is using Cassette.jl. Take a look how it is done in, e.g. KernelAbstractions.jl:
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/master/src/backends/cuda.jl#L240
2 Likes