Too many resources requested for launch

I get “too many resources requested for launch” in CUDA.jl kernel when I try to either

set value to the array set in global memory like


OR print anything using

CUDA.@cuprint “aa”

I suspect that It is becouse the amount of registers use is too high so

  1. I suppose that in this case it may be related to caching reasults fetched from global memory - can i swith it off?
  2. can i run GC on variables in register memory - so I would manually clear it before problem arise (CUDA.unsafe_free!() seems to work on arrays only - am I wrong? )
  3. How can I increase maxrregcount in CUDA.jl ?

Please don’t double-post. Here’s my response from Slack:

can’t you just launch fewer threads? it’s recommended to use the occupancy API to have it automatically decide how many threads to launch

maxregcount is an argument to @cuda, but it’ll result in spilling, so your kernel will be slow

you need to try and reduce register pressure, e.g., by avoiding converting values between types, or throwing exceptions, etc. inspect the generated code for that

GC doesn’t apply at all here

as a last-resort solution you can try splitting the kernel

but you probably don’t need all that, just make sure your kernel can handle any thread count and use the occupancy API

1 Like