I’ve been trying to convert the `rotate_kernel`

function from the 2021 JuliaCon GPU Workshop to work with CUDA, but I’m having trouble.

I’m new to building kernels on the GPU, but my current implementation of these functions are as follows:

```
function rotate_kernel(out, inp, angle)
x_idx = (blockDim().x * (blockIdx().x - 1)) + threadIdx().x
y_idx = (blockDim().y * (blockIdx().y - 1)) + threadIdx().y
x_centidx = x_idx - (size(inp,1)÷2)
y_centidx = y_idx - (size(inp,2)÷2)
x_outidx = round(Int, (x_centidx*cos(angle)) + (y_centidx*-sin(angle)))
y_outidx = round(Int, (x_centidx*sin(angle)) + (y_centidx*cos(angle)))
x_outidx += (size(inp,1)÷2)
y_outidx += (size(inp,2)÷2)
if (1 <= x_outidx <= size(out,1)) &&
(1 <= y_outidx <= size(out,2))
out[x_outidx, y_outidx] = inp[x_idx, y_idx]
end
return
end
```

and

```
function exec_gpu(f, sz, args...)
@cuda f(args...)
end
```

where my “lilly” array is a 250x250 array of `Float32`

:

```
lilly = rand(250,250)
lilly_gpu = CuArray(lilly)
lilly_rotated = similar(lilly_gpu)
lilly_rotated .= 0
```

and my functions are being called as follows:

```
exec_gpu(rotate_kernel, size(lilly_gpu), lilly_rotated, lilly_gpu, deg2rad(37))
```

This seems to run fine on my machine, giving me an output of:

```
CUDA.HostKernel{typeof(rotate_kernel), Tuple{CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}(rotate_kernel, CuFunction(Ptr{Nothing} @0x00000000c2fbef10, CuModule(Ptr{Nothing} @0x00000000c29103b0, CuContext(0x000000008baf59a0, instance 83eb46d269112d80))), CUDA.KernelState(Ptr{Nothing} @0x0000000604000000))
```

But when I try to image `Array(lilly_rotated)`

, only the zeros array is shown, and not the rotated array. What am I doing wrong? Any help is appreciated!