# GPU Map without reduction on multiple arrays indices

Hello,

I would like to compute a multi-dimensional array (4-D) A of a 2 components function f on a large multi-dimensional domain (X,Y,T) with GPU. So A will have size: 2, NX, NY, NT.
X, Y and T are 1D arrays

I tried to use the code of @SimonDanisch in An Introduction to GPU Programming in Julia - Nextjournal
with a simple map kernel.
This is what I wrote:

``````using GPUArrays, CuArrays
using SharedArrays
using DistributedArrays
using BenchmarkTools
function Base.map!(f::Function, A::GPUArray, X::GPUArray, Y::GPUArray, T::GPUArray)
# our function that will run on the gpu
function kernel(state, f, A, X, Y, T)
# If launch parameters aren't specified, linear_index gets the index
# into the Array passed as second argument to gpu_call (`A`)
I = CartesianIndex(state)
if I[2]*I[3]*I[4] <= length(A)/2
@inbounds A[1:2,I[2],I[3],I[4]] = f(X[I[2]], Y[I[3]], T[I[4]])
end
return
end
# call kernel on the gpu
gpu_call(kernel, A, (f, A, X, Y, T))
end

# on the GPU:
NX, NY, NT = 10, 15, 13
X, Y, T= rand(NX), rand(NY), rand(NT)
a = zeros(2, NX, NY, NT)

kernel(x,y,t) = [x+y, y-t]

xgpu, ygpu, tgpu, agpu = cu(X), cu(Y),cu(T), cu(a)
gpu_t = @belapsed begin
map!(\$kernel, \$agpu, \$xgpu, \$ygpu, \$tgpu)
GPUArrays.synchronize(\$agpu)
end
``````

I have the following error:

`GPU compilation of kernel(CuArrays.CuKernelState, typeof(kernel), CUDAnative.CuDeviceArray{Float32,4,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}) failed
KernelError: kernel returns a value of type Union{}

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.

I don’t know how to use the macro `@device_code_warntype` to debug the code. I am working on a cluster, I wrote JULIA_DEBUG=CUDAnative in the terminal before to launch the julia REPL but I got ‘UndefVarError: @device_code_warntype not defined’ when I run the above code

You need to import CUDAnative to have access to those functions and macros.

How did you write that code? The call to `CartesianIndex(state)` cannot get inferred by Julia, and I find nothing of the like in Simon’s tutorial. Instead, he uses `GPUArrays.linear_index`.

Also note that I would not recommend using either GPUArrays or extending `map!` for such an operation, it seems like you would be much easier off trying to model this as either a use of `map` or `broadcast` (but it’s hard to predict whether that’s possible without more details of the problem) or just use a regular kernel without the GPUArrays abstractions on top of it. See https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/