GPU Compilation Error: KernelError: kernel returns a value of type `Union{}`

Hi,

To familiarize myself with the syntax of CUDA.jl, I’m implement basic programs from scratch. I wrote a version of Matrix Multiplication that uses shared memory but it’s not compiling.

function MatMulGPU!(A_d::AbstractMatrix, B_d::AbstractMatrix, C_d::AbstractMatrix, len::Int)
    colIdx = (blockIdx().x-1)*blockDim().x + threadIdx().x
    rowIdx = (blockIdx().y-1)*blockDim().y + threadIdx().y
    
    As = @cuStaticSharedMem(T::Float32, (blockDim().y, blockDim().x))
    Bs = @cuStaticSharedMem(T::Float32, (blockDim().y, blockDim().x))
    
    tmp = 0
    
    for i = 0:blockDim().x:len-1
        # Loading into shared memory
        As[threadIdx().y, threadIdx().x] = A_d[rowIdx, i+threadIdx().x]
        Bs[threadIdx().y, threadIdx().x] = B_d[i+threadIdx().y, colIdx]
        
        sync_threads()
        
        # MatMul
        for j=1:blockDim().x
            tmp += As[threadIdx().y, j] * Bs[j, threadIdx().x]
        end
        
        sync_threads()
    end
    
    C_d[rowIdx, colIdx] = tmp
    
    return nothing
end

The error that I’m getting is as follows:

GPU compilation of kernel MatMulGPU!(CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float32, 1}, Int64) failed
KernelError: kernel returns a value of type `Union{}`

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.

I can’t figure out the mistake in my code (I’m ending my kernel with nothing).

Thanks

Try reducing your kernel.

Did you try the suggestion:

If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.

Also see the documentation: Troubleshooting · CUDA.jl

What do you mean by reducing the kernel?

I tried @device_code_warntype, and it’s giving something like

PTX CompilerJob of kernel MatMulGPU!(CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float32, 1}, Int64) for sm_75


│ ─ %-1  = invoke MatMulGPU!(::CuDeviceMatrix{Float64, 1},::CuDeviceMatrix{Float64, 1},::CuDeviceMatrix{Float32, 1},::Int64)::Union{}
Variables
  #self#::Core.Const(MatMulGPU!)
  A_d::CuDeviceMatrix{Float64, 1}
  B_d::CuDeviceMatrix{Float64, 1}
  C_d::CuDeviceMatrix{Float32, 1}
  len::Int64
  @_6::Union{}
  tmp::Union{}
  Bs::Union{}
  As::Union{}
  rowIdx::Union{}
  colIdx::Union{}
  @_12::Union{}
  i::Union{}
  j::Union{}

Body::Union{}
    @ In[13]:2 within `MatMulGPU!'
1 ─     invoke Main.blockIdx()
└──     unreachable

I’ve been trying to see what’s wrong with the code but I can’t find anything.

Removing code until it compiles in order to find the problematic statement.