Is there a way to use @allowscalar in a heterogeneous manner using KernelAbstractions?

Hi all,

Is it correct that scalar indexing like Out[1] is not allowed with KernelAbstractions.jl alone, and that we need GPUArrays.@allowscalar for this? Should we always use GPUArrays together with KernelAbstractions for such cases?

using KernelAbstractions
using CUDA
using GPUArrays

backend = CUDABackend()
Out = KernelAbstractions.zeros(backend, Float64, 1)
GPUArrays.@allowscalar Out[1]

You can simply copy the array back to the CPU first, which is what Out[1] does behind the scenes anyway.

1 Like

Is CUDA.jl copy the whole array back to CPU?

It does not: GPUArrays.jl/src/host/indexing.jl at e8e9b031613f31818e75a6c7f8745788fb80b71f · JuliaGPU/GPUArrays.jl · GitHub
So yes, in the case your data is very large it’s better to index a single item or perform a fine-grained copy yourself.