What is the maximal number of arguments a CUDAnative kernel can take? argc = 16 yields "Error: invalid kernel call; too many arguments"

Lian_Yunlong · July 3, 2018, 7:58pm

I have a kernel to pass to @cuda in CUDAnative.jl. It is a kernel function which takes 16 arguments:

function TrAXBY_CUDA!(cuTMP,G1,G2,Ind2,IX,JX,VX,IY,JY,VY,nΩ,nK,nnzX,nnzY,dimX,dimY)
         ... ...
         return nothing
end

using @device_code_warntype I got the following error:

Body::Union{}
2 1 ─ %1 = Base.llvmcall::Core.IntrinsicFunction                 │╻╷╷╷ macro expansion
  │        %1(Ptr{Nothing} @0x0000000003e53598, Ptr{Complex{Float64}}, Tuple{})
  │        $(Expr(:throw_undef_if_not, :tid, false))             ││   
  └──      unreachable                                           ││   
┌ Error: invalid kernel call; too many arguments
│   kernel = typeof(TrAXBY_CUDA!)
│   argc = 16
└ @ CUDAnative utils.jl:14
ERROR: LoadError: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
CompilerError: could not compile TrAXBY_CUDA!(CuDeviceArray{Complex{Float64},4,CUDAnative.AS.Global}, CuDeviceArray{Float64,4,CUDAnative.AS.Global}, CuDeviceArray{Float64,4,CUDAnative.AS.Global}, CuDeviceArray{Int64,1,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Float64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Int64,2,CUDAnative.AS.Global}, CuDeviceArray{Float64,2,CUDAnative.AS.Global}, Int64, Int64, Int64, Int64, Int64, Int64); kernel returns a value of type Any
Stacktrace:
 [1] validate_invocation(::CUDAnative.CompilerContext) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/validation.jl:15
 [2] compile_function(::CUDAnative.CompilerContext) at ./logging.jl:317
 [3] #cufunction#85(::Base.Iterators.Pairs{Symbol,typeof(TrAXBY_CUDA!),Tuple{Symbol},NamedTuple{(:inner_f,),Tuple{typeof(TrAXBY_CUDA!)}}}, ::Function, ::CuDevice, ::Function, ::Type) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/compiler.jl:655
 [4] (::getfield(CUDAnative, Symbol("#kw##cufunction")))(::NamedTuple{(:inner_f,),Tuple{typeof(TrAXBY_CUDA!)}}, ::typeof(cufunction), ::CuDevice, ::Function, ::Type) at ./none:0
 [5] _cuda(::CUDAnative.KernelWrapper{typeof(TrAXBY_CUDA!)}, ::typeof(TrAXBY_CUDA!), ::Tuple{}, ::NamedTuple{(:threads, :blocks),Tuple{Tuple{Int64,Int64,Int64},Tuple{Int64,Int64,Int64}}}, ::CuDeviceArray{Complex{Float64},4,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,4,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,4,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,1,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Int64,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64) at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/execution.jl:235
 [6] macro expansion at ./gcutils.jl:89 [inlined]
 [7] top-level scope at /home/yunlong/.julia7/packages/CUDAnative/pfAo/src/reflection.jl:154 [inlined]
 [8] top-level scope at ./<missing>:0
 [9] include at ./boot.jl:317 [inlined]
 [10] include_relative(::Module, ::String) at ./loading.jl:1075
 [11] include(::Module, ::String) at ./sysimg.jl:29
 [12] include(::String) at ./client.jl:393
 [13] top-level scope at none:0
...
...

Is it because I am passing tooo many arguments? What is the limit of the number of arguments of a CUDA kernel?
@maleadt

maleadt · July 3, 2018, 8:37pm

Max 13 arguments. Enforced by CUDAnative, but at its core due to Julia’s type inference limits.
An easy workaround is to pass a tuple instead, and with 0.7 you can easily destructure that tuple into variables again:

using CUDAnative

kernel((a,b,c,d,e,f,g,h,i,j,k,l,m,n)) = nothing

@cuda kernel((1,2,3,4,5,6,7,8,9,10,11,12,13,14))

ChrisRackauckas · July 3, 2018, 8:39pm

That was fixed in v0.7. Is the limitation for CUDAnative removed as well?

maleadt · July 3, 2018, 8:43pm

Ah, I missed merging of that PR. There’s still going to be a limit though, unless splatting also works properly know (Cassette-type inference problems).

ChrisRackauckas · July 3, 2018, 8:50pm

Oh, you splat too? The tuple limit was eliminated and the splat limit was increased to 32:

github.com/JuliaLang/julia

Bump tuple inference length cutoff from 16 to 32

JuliaLang:master ← JuliaLang:ajf/infer-ntuple-32

opened 05:16AM - 03 Jun 18 UTC

andyferris

+13 -43

* ~~Bumps `tupletype_len` to `31`~~ Removes `tupletype_len` entirely. * Bumps …`tuple_splat` to `32` * ~~Similarly for the types `Any16` -> `Any32` and `All16` -> `All32`.~~ For context, I feel this would be useful for working with heterogenously typed data as tuples and named tuples. In particular, for v0.7/v1.0, simple containers such as `Vector{NamedTuple{...}}` could be versatile, performant containers for tables and data (similarly for named tuples of arrays, etc), and at times the existing limits (where practically speaking its `14` elements being the biggest size that gives "full run-time speed") felt a bit limiting (e.g. a table with 15-30 columns doesn't seem particularly unreasonable, though for very large numbers I admit that switching to a more dynamic data structure might be preferable). Incidentally, this might help with things like arrays with 15+ dimensions and so-on (#20163) (cc @Jutho). Having `30` dimensions as the maximum with fully-inferred code seems a somewhat reasonable cutoff number to me, giving ~1 billion elements for a 2x2x2x... sized array, as in #20163. Of course, I'm more than a bit ignorant of what other impacts this may have internally for the compiler (compile time speed will obviously be slower in some situations) but I thought this might be worthwhile floating for inclusion in v0.7.

Still much better.

maleadt · July 4, 2018, 8:40am

OK, had a quick look, even when destructuring the splat in a generated function it still runs into issues with 36+ arguments. Didn’t have time to investigate, but still an improvement.
https://github.com/JuliaGPU/CUDAnative.jl/pull/216

ChrisRackauckas · July 4, 2018, 8:46am

I’m fine if it keeps satisfying Moore’s law for argument numbers.

Topic		Replies	Views
Error when using dynamic parallelism with six or more arguments GPU	1	429	August 14, 2020
CUDAnative: would it be possible to specify interface arguments rather than kernel arguments? GPU question	2	698	March 2, 2020
Passing too long tuples into CUDA kernel causes an error GPU gpu , cuda	3	1048	August 10, 2020
CUDAnative out of resources, but only when run from Atom/Juno? GPU juno , cudanative	2	709	May 29, 2020
Error when implementing multidimensional kernel GPU	6	635	November 27, 2023

What is the maximal number of arguments a CUDAnative kernel can take? argc = 16 yields "Error: invalid kernel call; too many arguments"

Related topics