Weird behavior in @cuda kernel dispatch

Consider the following dummy kernel function that accepts an argument of type CuVector.

julia> using CUDA

julia> function my_kernel(v::CuVector)
           nothing
       end
my_kernel (generic function with 1 method)

julia> v = CUDA.randn(10)

julia> v isa CuVector
true

julia> my_kernel(v)

The above code works well as expected. However, if I launch the kernel with @cuda

julia> @cuda my_kernel(v)
ERROR: MethodError: no method matching my_kernel(::CuDeviceVector{Float32, 1})
Closest candidates are:
  my_kernel(::CuArray{T, 1} where T) at REPL[2]:1
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:0 [inlined]
 [2] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(my_kernel), Tuple{CuDeviceVector{Float32, 1}}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
   @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:70
 [3] cufunction(f::typeof(my_kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ CUDA ~/.julia/packages/CUDA/Zmd60/src/compiler/execution.jl:294
 [4] cufunction(f::typeof(my_kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}}})
   @ CUDA ~/.julia/packages/CUDA/Zmd60/src/compiler/execution.jl:288
 [5] top-level scope
   @ ~/.julia/packages/CUDA/Zmd60/src/compiler/execution.jl:102

Why does it complain no matching method? Note that v isa CuVector returns true.

Verison info:

Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-1603 v3 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)

and CUDA v2.6.1.

CuVector and CuArray are host representations of cuda memory buffers. Because of this, they carry a lot of metadata and extraneous fields that don’t make any sense in the context of a cuda kernel (which expects little more than an unadorned pointer to some device memory). That’s the role CuDeviceVector and CuDeviceArray fill.

When you invoke a kernel with @cuda, CUDA.jl will auto-convert between host and device arrays. However, since my_kernel only accepts host arrays, julia will throw a MethodError when CUDA.jl tries to invoke it as GPU-side code with a device array. The easiest way to resolve this would be to remove the type annotation from my_kernel, since you’re likely only ever going to be calling it via @cuda anyhow.

1 Like

Many thanks for the explanation. You are totally right.

In the documentation of @cuda Compiler · CUDA.jl there is

It will be compiled to a CUDA function upon first use, and to a certain extent arguments will be converted and managed automatically using cudaconvert .

while

julia> CUDA.cudaconvert(CUDA.rand(3)) |> typeof

CuDeviceVector{Float32, 1} (alias for CuDeviceArray{Float32, 1, 1})

In addition

julia> supertype(CuDeviceVector)
AbstractVector{T} where T (alias for AbstractArray{T, 1} where T)

An alternative way is thus to declare the argument type to be AbstractVector.

1 Like