I have a kernel to pass to @cuda in CUDAnative.jl. It is a kernel function which takes 16 arguments:
function TrAXBY_CUDA!(cuTMP,G1,G2,Ind2,IX,JX,VX,IY,JY,VY,nΩ,nK,nnzX,nnzY,dimX,dimY)
... ...
return nothing
end
using @device_code_warntype I got the following error:
Body::Union{}
2 1 ─ %1 = Base.llvmcall::Core.IntrinsicFunction │╻╷╷╷ macro expansion
│ %1(Ptr{Nothing} @0x0000000003e53598, Ptr{Complex{Float64}}, Tuple{})
│ $(Expr(:throw_undef_if_not, :tid, false)) ││
└── unreachable ││
┌ Error: invalid kernel call; too many arguments
│ kernel = typeof(TrAXBY_CUDA!)
│ argc = 16
└ @ CUDAnative utils.jl:14
ERROR: LoadError: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
CompilerError: could not compile TrAXBY_CUDA!(CuDeviceArray{Complex{Float64},4,CUDAnative.AS.Global}, CuDeviceArray{Float64,4,CUDAnative.AS.Global}, CuDeviceArray{Float64,4,CUDAnative.AS.Global}, CuDeviceArray{Int64,1,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Float64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Float64,2,CUDAnative.AS.Global}, Int64, Int64, Int64, Int64, Int64, Int64); kernel returns a value of type Any
Stacktrace:
[1] validate_invocation(::CUDAnative.CompilerContext) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/validation.jl:15
[2] compile_function(::CUDAnative.CompilerContext) at ./logging.jl:317
[3] #cufunction#85(::Base.Iterators.Pairs{Symbol,typeof(TrAXBY_CUDA!),Tuple{Symbol},NamedTuple{(:inner_f,),Tuple{typeof(TrAXBY_CUDA!)}}}, ::Function, ::CuDevice, ::Function, ::Type) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/compiler.jl:655
[4] (::getfield(CUDAnative, Symbol("#kw##cufunction")))(::NamedTuple{(:inner_f,),Tuple{typeof(TrAXBY_CUDA!)}}, ::typeof(cufunction), ::CuDevice, ::Function, ::Type) at ./none:0
[5] _cuda(::CUDAnative.KernelWrapper{typeof(TrAXBY_CUDA!)}, ::typeof(TrAXBY_CUDA!), ::Tuple{}, ::NamedTuple{(:threads, :blocks),Tuple{Tuple{Int64,Int64,Int64},Tuple{Int64,Int64,Int64}}}, ::CuDeviceArray{Complex{Float64},4,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,4,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,4,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,1,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/execution.jl:235
[6] macro expansion at ./gcutils.jl:89 [inlined]
[7] top-level scope at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/reflection.jl:154 [inlined]
[8] top-level scope at ./<missing>:0
[9] include at ./boot.jl:317 [inlined]
[10] include_relative(::Module, ::String) at ./loading.jl:1075
[11] include(::Module, ::String) at ./sysimg.jl:29
[12] include(::String) at ./client.jl:393
[13] top-level scope at none:0
...
...
Is it because I am passing tooo many arguments? What is the limit of the number of arguments of a CUDA kernel?
@maleadt