I’ve been trying to convert the rotate_kernel
function from the 2021 JuliaCon GPU Workshop to work with CUDA, but I’m having trouble.
I’m new to building kernels on the GPU, but my current implementation of these functions are as follows:
function rotate_kernel(out, inp, angle)
x_idx = (blockDim().x * (blockIdx().x - 1)) + threadIdx().x
y_idx = (blockDim().y * (blockIdx().y - 1)) + threadIdx().y
x_centidx = x_idx - (size(inp,1)÷2)
y_centidx = y_idx - (size(inp,2)÷2)
x_outidx = round(Int, (x_centidx*cos(angle)) + (y_centidx*-sin(angle)))
y_outidx = round(Int, (x_centidx*sin(angle)) + (y_centidx*cos(angle)))
x_outidx += (size(inp,1)÷2)
y_outidx += (size(inp,2)÷2)
if (1 <= x_outidx <= size(out,1)) &&
(1 <= y_outidx <= size(out,2))
out[x_outidx, y_outidx] = inp[x_idx, y_idx]
end
return
end
and
function exec_gpu(f, sz, args...)
@cuda f(args...)
end
where my “lilly” array is a 250x250 array of Float32
:
lilly = rand(250,250)
lilly_gpu = CuArray(lilly)
lilly_rotated = similar(lilly_gpu)
lilly_rotated .= 0
and my functions are being called as follows:
exec_gpu(rotate_kernel, size(lilly_gpu), lilly_rotated, lilly_gpu, deg2rad(37))
This seems to run fine on my machine, giving me an output of:
CUDA.HostKernel{typeof(rotate_kernel), Tuple{CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}(rotate_kernel, CuFunction(Ptr{Nothing} @0x00000000c2fbef10, CuModule(Ptr{Nothing} @0x00000000c29103b0, CuContext(0x000000008baf59a0, instance 83eb46d269112d80))), CUDA.KernelState(Ptr{Nothing} @0x0000000604000000))
But when I try to image Array(lilly_rotated)
, only the zeros array is shown, and not the rotated array. What am I doing wrong? Any help is appreciated!