Hi,
To familiarize myself with the syntax of CUDA.jl, I’m implement basic programs from scratch. I wrote a version of Matrix Multiplication that uses shared memory but it’s not compiling.
function MatMulGPU!(A_d::AbstractMatrix, B_d::AbstractMatrix, C_d::AbstractMatrix, len::Int)
colIdx = (blockIdx().x-1)*blockDim().x + threadIdx().x
rowIdx = (blockIdx().y-1)*blockDim().y + threadIdx().y
As = @cuStaticSharedMem(T::Float32, (blockDim().y, blockDim().x))
Bs = @cuStaticSharedMem(T::Float32, (blockDim().y, blockDim().x))
tmp = 0
for i = 0:blockDim().x:len-1
# Loading into shared memory
As[threadIdx().y, threadIdx().x] = A_d[rowIdx, i+threadIdx().x]
Bs[threadIdx().y, threadIdx().x] = B_d[i+threadIdx().y, colIdx]
sync_threads()
# MatMul
for j=1:blockDim().x
tmp += As[threadIdx().y, j] * Bs[j, threadIdx().x]
end
sync_threads()
end
C_d[rowIdx, colIdx] = tmp
return nothing
end
The error that I’m getting is as follows:
GPU compilation of kernel MatMulGPU!(CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float32, 1}, Int64) failed
KernelError: kernel returns a value of type `Union{}`
Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.
I can’t figure out the mistake in my code (I’m ending my kernel with nothing
).
Thanks